Skip to main content

Regular Expressions Explained

Regular expressions (regex) are powerful patterns used to match character combinations in strings. This guide covers everything you need to know about regex, from basic concepts to advanced techniques.

What are Regular Expressions?

Regular expressions are sequences of characters that define a search pattern. They are used for:

  • Pattern matching
  • Text extraction
  • String validation
  • Search and replace operations

Basic Syntax

Character Classes

  • . - Any character except newline
  • \w - Word character [A-Za-z0-9_]
  • \d - Digit [0-9]
  • \s - Whitespace character

Quantifiers

  • * - 0 or more
  • + - 1 or more
  • ? - 0 or 1
  • {n} - Exactly n times
  • {n,} - n or more times
  • {n,m} - Between n and m times

Anchors

  • ^ - Start of line
  • $ - End of line
  • \b - Word boundary

Groups and Capturing

  • (...) - Capturing group
  • (?:...) - Non-capturing group
  • (?<name>...) - Named capturing group

Advanced Concepts

Lookahead and Lookbehind

  • (?=...) - Positive lookahead
  • (?!...) - Negative lookahead
  • (?<=...) - Positive lookbehind
  • (?<!...) - Negative lookbehind

Flags

  • g - Global search
  • i - Case-insensitive search
  • m - Multiline search
  • s - Allows . to match newline
  • u - Unicode; treat pattern as unicode sequence

Best Practices

  1. Keep it Simple

    • Break complex patterns into smaller parts
    • Use comments for clarity in complex regex
  2. Performance Considerations

    • Avoid excessive backtracking
    • Use atomic groups when possible
    • Be careful with nested quantifiers
  3. Common Pitfalls

    • Greedy vs. lazy matching
    • Character encoding issues
    • Proper escaping of special characters

Real-World Examples

Email Validation

^[\\w-\\.]+@([\\w-]+\\.)+[\\w-]{2,4}$

URL Parsing

https?:\\/\\/[\\w\\-\\.]+\\.[a-zA-Z]{2,}(\\/\\S*)?

Phone Numbers

\\(?\\d{3}\\)?[-.\\s]?\\d{3}[-.\\s]?\\d{4}

Testing and Debugging

  1. Use Test Cases

    • Test with valid inputs
    • Test with invalid inputs
    • Test edge cases
  2. Debugging Tools

    • Regex visualizers
    • Step-by-step debuggers
    • Performance profilers

Further Resources