Yet Another Data Science Blog: Regular Expressions

Showing posts with label Regular Expressions. Show all posts

Thursday, August 22, 2024

Creating a Regular Expression Pattern for Matching Vehicles Registration

You need to define a regular expression pattern to match a specific format of text. The format includes uppercase letters, digits, and spaces, and it must conform to a predefined structure. The goal is to create a pattern that can accurately identify and validate strings that follow this format.

1. **Pattern Components**:

- **Uppercase Letters**: The pattern begins with two uppercase letters (`[A-Z]{2}`).

- **Optional Space**: After the letters, there may be an optional space (`\s?`).

- **Digits**: Followed by two digits (`[0-9]{2}`).

- **Optional Space**: Another optional space (`\s?`).

- **Uppercase Letters**: Followed by two more uppercase letters (`[A-Z]{2}`).

- **Optional Space**: Again, an optional space (`\s?`).

- **Digits**: Concludes with four digits (`[0-9]{4}`).

- **Word Boundary**: The pattern ends with a word boundary to ensure the format does not accidentally include trailing characters (`\b`).

2. **Purpose**:

- This pattern is designed to match strings with a specific format, such as postal codes or codes that follow a similar structure. It ensures that the string consists of uppercase letters and digits arranged in a particular way, optionally separated by spaces.

The regular expression pattern defines a specific format consisting of uppercase letters, digits, and optional spaces. It is used to match strings that conform to this structure, which might be useful for validating or extracting formatted codes from text.

Crafting a Regular Expression to Match HTML-like Tag Structures

You need to define a regular expression pattern that matches a specific format of HTML-like tags. The goal is to create a pattern that validates strings where the format follows these rules:

1. **Tags**: The string should start with an opening HTML-like tag, contain some optional content, and end with a closing HTML-like tag.

2. **Tag Structure**:

- The opening tag should start with a `<`, followed by one or more lowercase letters, and end with a `>`.

- The content between the tags is optional and can include any word characters (letters, digits, and underscores).

- The closing tag should start with `</`, followed by one or more lowercase letters, and end with `>`.

1. **Tag Matching**:

- **Opening Tag**: `^<` ensures the string starts with an opening tag. `[a-z]{1,}` specifies that the tag name must be at least one lowercase letter long. `>` indicates the end of the opening tag.

- **Content**: `[\w]{0,}` allows for zero or more word characters between the tags.

- **Closing Tag**: `<\/[a-z]{1,}>` matches a closing tag, which starts with `</`, followed by one or more lowercase letters, and ends with `>`.

2. **Pattern Details**:

- The `^` asserts the position at the start of the string.

- The pattern ensures that the tag names in the opening and closing tags are the same.

- The content between the tags is optional.

### Summary:

This regular expression pattern is designed to match strings that start with an HTML-like opening tag, optionally contain some content, and end with a closing tag. The tags must be properly formatted and the same tag name should be used for both the opening and closing tags.

Friday, August 16, 2024

Custom Regex Matching Engine in Python

It checks whether a given string matches a specified pattern, which may include some basic regex operators like `.` (any character), `^` (start of string), `$` (end of string), `*`, `+`, `?` (repetition operators), and `\` (escape character). The code is modular, with each stage adding complexity and functionality to the regex engine. Below is a summary of the key stages and functions:

---

### **Stage 1: Basic Single-Character Matching**

- **Function:** `single_or_empty_char(regex: str, literal: str) -> bool`

- **Purpose:** Checks if a single character in the regex matches the corresponding literal character or if the regex is empty.

---

### **Stage 2: Matching Strings of Equal Length**

- **Function:** `equal_len(regex: str, literal: str) -> bool`

- **Purpose:** Recursively checks if a regex matches a literal string when both have the same length. The `.` operator matches any single character.

---

### **Stage 3: Matching Strings of Different Lengths**

- **Function:** `different_len(regex: str, literal: str) -> bool`

- **Purpose:** Handles cases where the regex and literal strings have different lengths. It allows the regex to skip characters in the literal string and continue matching.

---

### **Stage 4: Handling Start `^` and End `$` Anchors**

- **Function:** `fix_operators(regex: str, literal: str) -> bool`

- **Purpose:** Supports `^` and `$` operators that anchor the match to the start or end of the string, respectively. This function ensures that if these anchors are present, the regex matches from the beginning or end as required.

---

### **Stage 5: Repetition Operators (`*`, `+`, `?`)**

- **Sub-Functions:**

- **`current_scenario(base: list, index: int, symbol: str, literal_len: int) -> list`:** Generates possible matching scenarios by expanding repetition operators.

- **`find_scenarios(base: list, idx_with_meta: dict, max_len: int) -> list`:** Generates all possible regex scenarios based on repetition operators.

- **Main Function:** `repetition_operators(regex: str, literal: str, escape: dict = None) -> bool`

- **Purpose:** Handles the `*`, `+`, and `?` repetition operators by generating different matching scenarios and checking them against the literal string.

---

### **Stage 6: Escaping Special Characters**

- **Function:** `escape_operator(regex: str, string: str) -> bool`

- **Purpose:** Adds support for the `\` escape character, allowing special characters in the regex to be treated as literals. This function processes the regex string to recognize escaped characters and then delegates to the previous stages.

---

### **Main Execution**

- **Usage:** The program takes user input in the format `regex|string` and outputs whether the string matches the regex.

- **Function:** `escape_operator(regex, string) -> bool`

- **Purpose:** Acts as the entry point to the regex engine, invoking the appropriate matching logic based on the complexity of the regex.

---

### **Conclusion**

This code progressively builds a custom regex engine by implementing core regex features in stages. It supports simple literals, basic operators like `.` for any character, and anchors like `^` and `$`. It also handles more complex repetition operators (`*`, `+`, `?`) and allows for escaping special characters (`\`). This modular approach allows for easy extension and understanding of how regex matching can be implemented from scratch.

Yet Another Data Science Blog

Pages

Thursday, August 22, 2024

Creating a Regular Expression Pattern for Matching Vehicles Registration

Crafting a Regular Expression to Match HTML-like Tag Structures

Friday, August 16, 2024

Custom Regex Matching Engine in Python

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

Popular Posts

Posts Per Category

🎮 AI Fun Zone

🧠 AI Quiz

🎯 Guess Game

⚡ Speed Test

✊ Rock Paper Scissors

🔢 Quick Math

🧩 Memory Game

⌨️ Typing Speed

🟥 Color Click

🎲 Dice Game

Explore AI Hub

Latest Posts

AI Category

🚀 Trending AI Projects

📊 Data Science Resources

📚 Latest Research Papers

🔥 New AI Tools

💬 Developer Discussions

Contact Form

Followers