Regular Expressions Explained
Regular expressions (regex) are powerful patterns used to match character combinations in strings. This guide covers everything you need to know about regex, from basic concepts to advanced techniques.
What are Regular Expressions?
Regular expressions are sequences of characters that define a search pattern. They are used for:
- Pattern matching
- Text extraction
- String validation
- Search and replace operations
Basic Syntax
Character Classes
.
- Any character except newline\w
- Word character [A-Za-z0-9_]\d
- Digit [0-9]\s
- Whitespace character
Quantifiers
*
- 0 or more+
- 1 or more?
- 0 or 1{n}
- Exactly n times{n,}
- n or more times{n,m}
- Between n and m times
Anchors
^
- Start of line$
- End of line\b
- Word boundary
Groups and Capturing
(...)
- Capturing group(?:...)
- Non-capturing group(?<name>...)
- Named capturing group
Advanced Concepts
Lookahead and Lookbehind
(?=...)
- Positive lookahead(?!...)
- Negative lookahead(?<=...)
- Positive lookbehind(?<!...)
- Negative lookbehind
Flags
g
- Global searchi
- Case-insensitive searchm
- Multiline searchs
- Allows . to match newlineu
- Unicode; treat pattern as unicode sequence
Best Practices
-
Keep it Simple
- Break complex patterns into smaller parts
- Use comments for clarity in complex regex
-
Performance Considerations
- Avoid excessive backtracking
- Use atomic groups when possible
- Be careful with nested quantifiers
-
Common Pitfalls
- Greedy vs. lazy matching
- Character encoding issues
- Proper escaping of special characters
Real-World Examples
Email Validation
^[\\w-\\.]+@([\\w-]+\\.)+[\\w-]{2,4}$
URL Parsing
https?:\\/\\/[\\w\\-\\.]+\\.[a-zA-Z]{2,}(\\/\\S*)?
Phone Numbers
\\(?\\d{3}\\)?[-.\\s]?\\d{3}[-.\\s]?\\d{4}
Testing and Debugging
-
Use Test Cases
- Test with valid inputs
- Test with invalid inputs
- Test edge cases
-
Debugging Tools
- Regex visualizers
- Step-by-step debuggers
- Performance profilers