Sunday, September 29, 2013

Regular Expression Matching

Problem:
Implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:

 bool isMatch(const char *s, const char *p)
Some examples:

isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

Solution:
  1. If the next character of p is NOT '*', then it must match the current character of s. Continue pattern matching with the next character of both s and p.
  2. If the next character of p is '*', then we do a brute force exhaustive matching of 0, 1, or more repeats of current character of p... Until we could not match any more characters.

bool isMatch(const char *s, const char *p) {
    assert(s && p);
    if (*p == '\0') return *s == '\0';

    // next char is not '*': must match current character
    if (*(p+1) != '*') {
        assert(*p != '*');
        return ((*p == *s) || (*p == '.' && *s != '\0')) && isMatch(s+1, p+1);
    }
    // next char is '*'
    while ((*p == *s) || (*p == '.' && *s != '\0')) {
        if (isMatch(s, p+2)) return true;
        s++;
    }
    return isMatch(s, p+2);
}


Complexity:
time - O(2^n)
space - O(n)
Links and credits:
http://discuss.leetcode.com/questions/175/regular-expression-matching

Optimal solution (that grep, awk, and other tools use):
http://swtch.com/~rsc/regexp/regexp1.html

No comments:

Post a Comment