Regular Expression Matching

Implement regular expression matching with support for ‘.’ and ‘*’.

‘.’ Matches any single character.
‘*’ Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch(“aa”,”a”) → false
isMatch(“aa”,”aa”) → true
isMatch(“aaa”,”aa”) → false
isMatch(“aa”, “a*”) → true
isMatch(“aa”, “.*”) → true
isMatch(“ab”, “.*”) → true
isMatch(“aab”, “c*a*b”) → true

Online Judge
This problem is available at Online Judge. Head over there and it will judge your solution. Currently only able to compile C++/Java code. If you are using other languages, you can still verify your solution by looking at the judge’s test cases and its expected output.

Background:
It might seem deceptively easy even you know the general idea, but programming it correctly with all the details require careful thought.

Edit:
It seems that some readers are confused about why the regex pattern “.*” matches the string “ab”. “.*” means repeat the preceding element 0 or more times. Here, the “preceding” element is the dot character in the pattern, which can match any characters. Therefore, the regex pattern “.*” allows the dot to be repeated any number of times, which matches any string (even an empty string).

Hints:
Think carefully how you would do matching of ‘*’. Please note that ‘*’ in regular expression is different from wildcard matching, as we match the previous character 0 or more times. But, how many times? If you are stuck, recursion is your friend.

This problem is a tricky one. Due to the huge number of edge cases, many people would write lengthy code and have numerous bugs on their first try. Try your best getting your code correct first, then refactor mercilessly to as clean and concise as possible!


A sample diagram of a deterministic finite state automata (DFA). DFAs are useful for doing lexical analysis and pattern matching. An example is UNIX’s grep tool. Please note that this post does not attempt to descibe a solution using DFA.

Solution:
This looks just like a straight forward string matching, isn’t it? Couldn’t we just match the pattern and the input string character by character? The question is, how to match a ‘*’?

A natural way is to use a greedy approach; that is, we attempt to match the previous character as many as we can. Does this work? Let us look at some examples.

s = “abbbc”, p = “ab*c”
Assume we have matched the first ‘a’ on both s and p. When we see “b*” in p, we skip all b’s in s. Since the last ‘c’ matches on both side, they both match.

s = “ac”, p = “ab*c”
After the first ‘a’, we see that there is no b’s to skip for “b*”. We match the last ‘c’ on both side and conclude that they both match.

It seems that being greedy is good. But how about this case?

s = “abbc”, p = “ab*bbc”
When we see “b*” in p, we would have skip all b’s in s. They both should match, but we have no more b’s to match. Therefore, the greedy approach fails in the above case.

One might be tempted to think of a quick workaround. How about counting the number of consecutive b’s in s? If it is smaller or equal to the number of consecutive b’s after “b*” in p, we conclude they both match and continue from there. For the opposite, we conclude there is not a match.

This seem to solve the above problem, but how about this case:
s = “abcbcd”, p = “a.*c.*d”

Here, “.*” in p means repeat ‘.’ 0 or more times. Since ‘.’ can match any character, it is not clear how many times ‘.’ should be repeated. Should the ‘c’ in p matches the first or second ‘c’ in s? Unfortunately, there is no way to tell without using some kind of exhaustive search.

We need some kind of backtracking mechanism such that when a matching fails, we return to the last successful matching state and attempt to match more characters in s with ‘*’. This approach leads naturally to recursion.

The recursion mainly breaks down elegantly to the following two cases:

  1. If the next character of p is NOT ‘*’, then it must match the current character of s. Continue pattern matching with the next character of both s and p.
  2. If the next character of p is ‘*’, then we do a brute force exhaustive matching of 0, 1, or more repeats of current character of p… Until we could not match any more characters.

You would need to consider the base case carefully too. That would be left as an exercise to the reader. :)

Below is the extremely concise code (Excluding comments and asserts, it’s about 10 lines of code).

Further Thoughts:
Some extra exercises to this problem:

  1. If you think carefully, you can exploit some cases that the above code runs in exponential complexity. Could you think of some examples? How would you make the above code more efficient?
  2. Try to implement partial matching instead of full matching. In addition, add ‘^’ and ‘$’ to the rule. ‘^’ matches the starting position within the string, while ‘$’ matches the ending position of the string.
  3. Try to implement wildcard matching where ‘*’ means any sequence of zero or more characters.

For the interested reader, real world regular expression matching (such as the grep tool) are usually implemented by applying formal language theory. To understand more about it, you may read this article.

VN:F [1.9.22_1171]
Rating: 4.8/5 (86 votes cast)
Regular Expression Matching, 4.8 out of 5 based on 86 ratings

115 thoughts on “Regular Expression Matching

  1. guoguo

    why isMatch(“aab”, “c*a*b”) -> true?
    obviously, the second string contains one extra ‘c’ compared to the first string. Am i correct?
    Thanks, great website:)

    VA:F [1.9.22_1171]
    0
    1. Profile photo of 1337c0d3r1337c0d3r Post author

      isMatch(“aab”, “c*a*b”) returns true because c* means 0 or more ‘c’ to match. Since it doesn’t start with a ‘c’, so c* matches 0 ‘c’. Similarly, a* matches the two ‘a’s, and b matches the last ‘b’.

      VN:F [1.9.22_1171]
      +10
        1. Profile photo of AndyAndy

          I think “c*” should be clarified in the question. People may think ‘* ‘in “c*” means it can be 0 or more previous char, which is ‘c’. But ‘c’ in “c*” already exists. So “c*” means at least one ‘c’.

          I guess here ‘*’ can cancel the previous char, and thus “c*” means 0 or more ‘c’. But, I think this is not common sense.

          VN:F [1.9.22_1171]
          +1
          1. Profile photo of AceXAceX

            c* means 0 or more. Its a conception in “computation theory”. Its a important course in computer science. And in compiler theory, this * conception is also mentioned very often. If you wanna describe 1 or more, normally, we use c+.

            VN:F [1.9.22_1171]
            0
    1. Li Jin

      In case s=”a…..ab” p=”a*a*….a*”, it takes exponential time and O(n) space. Using memorizing, I think it takes O(n^3) time and O(n^2) space.

      VA:F [1.9.22_1171]
      +3
  2. ds

    Could you tell why isMatch(“ab”, “.*”) → true? .can match “a”, so “.*” is a or aa or aa… How could it be “ab”?

    VA:F [1.9.22_1171]
    +3
      1. Profile photo of AMAY SHANBHAGAMAY SHANBHAG

        I still don’t think its correct.. because lets say we assume any char for ‘.’ and that can be occur 0 or more times.. so but still at the end there should be b.

        if the condition was isMatch(“ab”, “.*b”) I can say its true. But how isMatch(“ab”, “.*”) can be true?…..

        VN:F [1.9.22_1171]
        0
  3. rockstar

    how could isMatch(“ab”,”.*”) -> true ?? . matches a then preceding * can only match more a’s , how did this b came then ….!!

    VA:F [1.9.22_1171]
    0
      1. ds

        It should not be true. * means 0 or multiple “preceding” character. In this case it is ‘a’. So how did ‘b’ come from?

        VA:F [1.9.22_1171]
        0
        1. Profile photo of Jianqing guanJianqing guan

          The preceding character is ‘.’, which can stand for any character.
          In a another word, “.*’ equals anything as long as there is no tail behind, like “.*c”. If there is a tail, in s we must have corresponding ‘c’

          VN:F [1.9.22_1171]
          0
          1. Profile photo of AMAY SHANBHAGAMAY SHANBHAG

            So let say you assume any character for eg: ‘m’ for character ‘.’ now for ‘ * ‘ you can have 0 or multiple ‘m’. Then what ?.. but you cant have anything after m so this case will fail. so its false.
            Now to make it true we have to assume a for ‘ . ‘ so we can say 0 or multiple a but at the end you should not have any character except ‘a’ according to pattern. so how can you consider b?

            if the condition was isMatch(“ab”, “.*b”) I can say its true. But how isMatch(“ab”, “.*”) can be true?…..

            VN:F [1.9.22_1171]
            0
    1. Profile photo of 1337c0d3r1337c0d3r Post author

      @rockstar and @ds:
      I mentioned this in my post, “.*” means repeat the preceding element of the pattern itself, which is the single dot character. Since dot can match virtually any character, the pattern “.*’ will match the string “ab”.

      If you do not believe me, I’ll quote this from here, http://www.regular-expressions.info/dot.html :
      “Suppose you want to match a double-quoted string. Sounds easy. We can have any number of any character between the double quotes, so “.*” seems to do the trick just fine. The dot matches any character, and the star allows the dot to be repeated any number of times, including zero.”

      VN:F [1.9.22_1171]
      +2
  4. Profile photo of Ashok KoyiAshok Koyi

    Please find the java version of the solution [Naive method]

    VN:F [1.9.22_1171]
    +3
    1. Profile photo of Maxim GalushkaMaxim Galushka

      Here is my version based on author posted solution:


      public boolean matches(String text, String pattern) {
      // if pattern is empty - whole string should be empty to match
      if (pattern.length() == 0) return text.length() == 0;

      // retrieve 2nd character of the pattern
      String nextChar = (pattern.length() > 1) ? String.valueOf(pattern.charAt(1)) : "";
      char t = (text.length() > 0) ? text.charAt(0) : 0;
      char p = pattern.charAt(0);

      // if 2nd char is not '*' - it means that we are checking if next char in text matches with next char in pattern
      // and recursively run the code starting from +1 char in both pattern/text
      if (!"*".equals(nextChar)) {
      return ((t == p) || (p == '.' && t != 0)) &&
      matches(text.substring(1), pattern.substring(1));
      } else {
      while ((t == p) || (p == '.' && t != 0)) {
      // check if current text matches tail part of the pattern (+2 symbols)
      if (matches(text, pattern.substring(2))) return true;
      // cut first symbol in original text and repeat the check in the loop
      text = text.substring(1);
      t = (text.length() > 0) ? text.charAt(0) : 0;
      }
      return matches(text, pattern.substring(2));
      }
      }

      VN:F [1.9.22_1171]
      0
  5. Tanin

    I come up with a dynamic programming solution. It takes O(n^2) in time and O(n) in space.

    Could you please check if it is correct.

    Here is the solution:

    d[i,j] is true iff a[0..i] matches r[0..j]

    d[i,j] is true if one of these cases is true:

    (1) a[i] == r[j] and d[i-1,j-1] is true (match a character with a character)
    (2) r[j] == ‘.’ and d[i-1,j-1] is true
    (3) r[j] is * and it can match a[i] and d[i-1,j-1] is true (* consumes character as its first one)
    (4) r[j] is * and it can match a[i] and d[i-1,j] is true (* consumes 1 more character)
    (5) r[j] is * and it can match a[i] and d[i,j-1] is true (* matches an empty string)

    The solution is called “Elastic Matching” used in Handwriting recognition.

    Please see the full solution here: http://tanin.nanakorn.com/blog/!/id25

    VA:F [1.9.22_1171]
    0
    1. Profile photo of 1337c0d3r1337c0d3r Post author

      IsMatch(“a”, “ab*a”) should return “false”.
      This is because the pattern “ab*a” means the string begin with an a, following by 0 or more b’s, then an another a at the end.
      This imply that it must minimally contain two characters which has two a’s, ie. “aa”.

      VN:F [1.9.22_1171]
      0
    1. Profile photo of 1337c0d3r1337c0d3r Post author

      This is to assume that multiple *’s are not in the input pattern and before a * there must be a letter to match, or else the code above will not be correct. In the case where multiple *’s can be in the input pattern, you can just use a loop to skip those, as those are redundant and won’t affect the correctness.

      VN:F [1.9.22_1171]
      0
  6. Nova2358

    There is a bug.

    char* exp = “b*bb”, char* text = “bb”

    This should be a “Not match”, but your code will answer “match”.

    You should change your last code to

    VA:F [1.9.22_1171]
    -1
    1. Profile photo of 1337c0d3r1337c0d3r Post author

      Thanks for your comment.

      exp=”b*bb” and text=”bb” should be a match. This is because “b*” could match 0 or more b’s. In this case it would be 0 b.

      VN:F [1.9.22_1171]
      0
  7. Jay

    I think my code is same with answer but it shows the backtracking aspect of problem more.

    int match(char *s, char *p)
    {
    int char_matched = 0;

    if (*s == ”)
    return (*p == ”);

    if (*s == *p || *p == ‘.’)
    char_matched = 1;

    if (*(p+1) == ‘*’)
    return char_matched? (match(s+1, p) || match(s+1, p+2)) : match(s+1, p+2);
    else
    return char_matched? match(s+1, p+1): 0;
    }

    VA:F [1.9.22_1171]
    0
    1. Jay

      I just saw “bb”, “b*bb” case above. Yeah.. this should match. Added the case.

      int match(char *s, char *p)
      {
      int char_matched = 0;

      if (*s == ”)
      return (*p == ”);

      if (*s == *p || *p == ‘.’)
      char_matched = 1;

      if (*(p+1) == ‘*’)
      return char_matched? (match(s+1, p) || match(s+1, p+2) || match(s, p+2)) : match(s+1, p+2);
      else
      return char_matched? match(s+1, p+1): 0;
      }

      VA:F [1.9.22_1171]
      0
      1. Alex

        This code is buggy. Here are some examples where it would fail:
        – it would return false on empty string for regex “.*” while it should have returned true.
        – it would fail in the case such as text=”ab”and regex=”ab.*”.

        VA:F [1.9.22_1171]
        0
  8. Hello

    // next char is ‘*’
    while ((*p == *s) || (*p == ‘.’ && *s != ”))

    Hello, in this piece of code, *p == ‘.’ would never be true since there is no any code inside while loop operates p.

    VA:F [1.9.22_1171]
    0
  9. Profile photo of flynewdreamflynewdream

    Cannot understand the following code:
    // next char is ‘*’
    while ((*p == *s) || (*p == ‘.’ && *s != ”)) {
    if (isMatch(s, p+2)) return true;
    s++;
    }

    why use “if (isMatch(s, p+2)) return true;”? This will stop comparing s to the characters before ‘*’ in p. Is this correct?

    VN:F [1.9.22_1171]
    0
    1. Profile photo of Brady FangBrady Fang

      “the regex pattern “.*” allows the dot to be repeated any number of times, which matches any string (even an empty string)”.

      s = abdabcabc; p = .*abc.
      (1) .* will match the first six characters of s, i.e., abdabc (you could consider .* as “……” in this case);
      (2) the last three characters of s and p match.

      Therefore, .*abc matches abdabcabc.

      VN:F [1.9.22_1171]
      0
  10. Profile photo of Zhenjie ChenZhenjie Chen

    Python? DP can be applied (not in the following code).

    Run:
    ./reg.py “abcbcd” “a.*c.*d”

    VN:F [1.9.22_1171]
    +1
  11. Guyue Liu

    I am not sure about the time complexity of solution.
    Why do not solve this problem via DP? it would be O(n ^ 2) ~

    VA:F [1.9.22_1171]
    0
  12. Guyue Liu

    here’s my code:

    it can pass all test cases.

    VA:F [1.9.22_1171]
    +1
  13. Profile photo of zhong zhangzhong zhang

    Further thoughts:
    Problem 1:
    I came up one case that takes a long time to get the result while the length of string s and p are not huge.

    s[] = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
    p[] = "a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*b"

    VN:F [1.9.22_1171]
    0
  14. Pingback: regular expression matching support for ‘.’ and ‘*’ | Jisku.com - Developers Network

  15. Subramanian Ganapathy

    Knuth morris pratt modified, * is greedy. All we have to do is build patterns of all characters upto *

    for .e.g ab*bc*cd, we build 3 prefix that are also suffixes pattern for ab , bc and cd respectively.

    for the actual string, we first start with ab using its prefix array try to match the string, once a match occurs we recursively try to match from that matched position for bc and finally for cd. we backtrack when we complete the entire string for any recursion level, seems like O(n*number of stars) to me. I am in the process of writing code for it.

    VA:F [1.9.22_1171]
    0
    1. Subramanian Ganapathy

      My previous solution was for wildcard match but it lends intuitively into regex match also. Cancel kmp prefix generation for a character whose next char is a *, then the same algo holds but this time we need to move to the next matching block once the current * char is matched and backtrack here only when the next matching block fails.

      VA:F [1.9.22_1171]
      0
  16. shady

    nice post, but has problem as isMatch(“aa”, “*a”) triggers assert which leads to termination of program.

    this line is incorrect in the code, as it can lead to segmentation fault… how can we directly access p+2 element, we know for sure that p is not ”, but p+1 element can be ” , therefore leading to p+2 to be undefined.

    VA:F [1.9.22_1171]
    0
  17. Profile photo of NathanNathan

    I think there is a bug for your code.

    For the case where p = “a” and s = “b*a”, your code will return FALSE. However, I believe it should return TRUE.

    What do you think? Did I miss anything?

    VN:F [1.9.22_1171]
    0
  18. Profile photo of codingcoding

    The following is the DP solution which passes all the online cases:

    class Solution {
    public:
    bool isMatch(const char *s, const char *p) {
    vector row_char, column_char;
    vector column_st;
    while(*s != ”)
    row_char.push_back(*(s++));

    while(*p != ”)
    {
    if(*p != ‘*’)
    {
    column_char.push_back(*p);
    if(*(p + 1) == ‘*’)
    column_st.push_back(true);
    else
    column_st.push_back(false);
    }
    p++;
    }
    int row_sz = row_char.size(), column_sz = column_char.size();
    if(row_sz == 0 && column_sz == 0)
    return true;
    if(row_sz == 0)
    {
    for(int i = 0; i < column_sz; i++)
    if(column_st[i] == false)
    return false;
    return true;
    }
    if(column_sz == 0)
    return false;

    bool state[2][column_sz];
    int flag = false;
    int i = 0;
    for(; i < column_sz; i++)
    {
    if(column_char[i] == '.' || column_char[i] == row_char[0])
    flag = true;
    if(column_st[i] == true)
    state[0][i] = true;
    else
    break;
    }
    if(i == column_sz && flag == false)
    return false;
    if(i < column_sz)
    {
    if(column_char[i] == '.' || column_char[i] == row_char[0])
    state[0][i] = true;
    else
    state[0][i] = false;
    i++;
    }
    for(; i < column_sz; i++)
    {
    if(column_st[i] == true && state[0][i – 1] == true)
    state[0][i] = true;
    else
    state[0][i] = false;
    }

    int cur = 1, pre = 0;

    for(i = 1; i < row_sz; i++)
    {
    if(column_st[0] == true && column_char[0] == '.')
    state[cur][0] = true;
    else if(column_st[0] == true && state[pre][0] == true && row_char[i – 1] == column_char[0] && row_char[i] == column_char[0])
    state[cur][0] = true;
    else
    state[cur][0] = false;
    for(int j = 1; j < column_sz; j++)
    {
    if(column_st[j] == true && state[cur][j – 1] == true)
    state[cur][j] = true;
    else if(column_char[j] == '.' || row_char[i] == column_char[j])
    {
    if(state[pre][j – 1] == true)
    state[cur][j] = true;
    else if(column_st[j] == true && state[pre][j] == true)
    state[cur][j] = true;
    else
    state[cur][j] = false;
    }
    else
    state[cur][j] = false;
    }

    swap(cur, pre);
    }
    return state[pre][column_sz – 1];
    }
    };

    VN:F [1.9.22_1171]
    0
  19. Jens

    Here is the ultimate in conciseness (short of using a regex library): 6 lines of Prolog!

    Note that 42 is the ASCII for ‘*’ and 46 the ASCII for ‘.’. A sample query is this:

    VA:F [1.9.22_1171]
    0
  20. Profile photo of theactualtheactual

    VN:F [1.9.22_1171]
    +2
  21. Annie

    isMatch(s, p+2) maybe out of boundary, in which case will store unexpected characters. Better check before go into.

    *(p+1) != ” && isMatch(s, p+2)

    VA:F [1.9.22_1171]
    +1
  22. Profile photo of WonghiukongWonghiukong

    Use DP, Time Complexity is O(N^2), and I only store the last and current row of the table, so the Space Complexity is O(N). Here is the working code.

    VN:F [1.9.22_1171]
    0
  23. Profile photo of Gordon XinGordon Xin

    input abaaabc
    pattern .*bc

    VN:F [1.9.22_1171]
    0
  24. mega_mind

    I think the posted sloution will fail for the following test case:

    s=”xabbb” p=”x.*”

    for .* first it will check for a and it will match…. but the rest of three b’s will also match with the same .*. so it will return TURE but it should return FALSE.
    ifff s=”xaaaa” p=”x.*” then it should return TRUE. plz lemmy know if im right or wrong.

    VA:F [1.9.22_1171]
    0
  25. Pingback: leetcode: Regular Expression Matching solution | 烟客旅人 sigmainfy

  26. Profile photo of JohnsonJohnson

    Here is my solution,

    int Findstr(char *string, char *str, int len)
    {
    assert(string != NULL && str !=NULL && len>0);
    for(int i=0; i<strlen(string); i++)
    {
    int k = i, j;
    for(j=0; j<len && (str[j] == '?' || string[k] == str[j]); j++,k++);
    if(j==len) return i;
    }
    return -1;
    }

    int SplitString(char *&str, char c)
    {
    assert(str!=NULL); //?? temp!=NULL
    int cnt=0;
    int len = strlen(str);
    for(int i=0; i<len; i++)
    {
    if(str[i]!=c && (i+1== len || str[i+1] == c)) //str[i] ==c && (i+1 == len || str[i+1] != c)
    {
    str = str+i-cnt;
    return cnt+1;
    }
    else if(str[i]!=c)
    cnt++;
    }
    return -1;
    }

    bool matchpattern(char *text, char *pattern)
    {
    assert(text!=NULL && pattern!=NULL);
    char *strP = pattern;
    char *strT = text;

    int len,index=0;
    while((len = SplitString(strP, '*')) != -1
    && (index = Findstr(strT, strP, len)) != -1)
    {
    strP += len;
    strT += index+len;
    }
    if((len == -1) && (pattern[strlen(pattern)-1] == '*' || *strT == NULL)) return true;
    return false;
    }

    VN:F [1.9.22_1171]
    0
  27. Profile photo of JohnJohn

    DP solution, time complexity n^2

    public boolean isMatch(String s, String p) {
    if (s == null || p == null) return false;
    int sl = s.length();
    int pl = p.length();
    boolean a[][] = new boolean[pl+1][sl+1];
    a[0][0] = true;
    for(int i=1; i<=sl;i++){
    a[0][i] = false;
    }
    for(int i=1; i 1) {
    a[i][0] = a[i-2][0];
    } else {
    a[i][0] = false;
    }
    }
    for(int i=1; i <= pl; i++) {
    for(int j=1; j <= sl; j++) {
    if(p.charAt(i-1) == '*' && i != 1) {
    a[i][j] = a[i-2][j] || a[i-1][j] ||
    (a[i][j-1] && (p.charAt(i-2) == '.' || p.charAt(i-2) == s.charAt(j-1)));
    } else {
    a[i][j] = a[i-1][j-1]
    && (p.charAt(i-1) == '.' || p.charAt(i-1) == s.charAt(j-1));

    }

    }
    }
    return a[pl][sl];
    }

    VN:F [1.9.22_1171]
    0
  28. Profile photo of JohnJohn

    DP solution, time complexity n^2

    VN:F [1.9.22_1171]
    0
  29. Profile photo of FentoyalFentoyal

    I once extended the classical KMP algorithm to make it able to handle wild card in pattern matching. The complexity stays O(N) time and O(N) space. This is the code, but it is extremely hard to understand: 10 times harder than the classical KMP. I don’t want to put any explanation here because a detailed explanation may cost a paper. If someone is really interested in it, email me: the.easiest.name.to.remember@gmail.com

    vector next_ext(const string &str)
    {
    int leng = str.size();
    if (leng < 0 || str.empty())
    return vector();
    vector next(leng + 1);
    int i = -1, j, off; //treated like str[-1] is a ‘*’
    off = i, j = off, i ++, next[i] = off;
    while (i < leng)
    {
    if (str[i] == '*' ) //consistent with the above comment.
    off = i, j = off, i ++, next[i] = off;
    else if (j == off || str[i] == str[j] || str[j] == '.')
    ++i, ++j, next[i] = j;
    else
    j = next[j];
    }
    return next;
    }

    //if matched, return the last position of the first matched substring of the string, else return -1.
    int kmp_ext(string str, string pattern)
    {
    int str_len = str.size(), pattern_len = pattern.size();
    vector next = next_ext(pattern);
    int i = 0, j = -1, off; // treated like pattern[-1] is a ‘*’
    off = j, j ++;
    while(i < str_len && j < pattern_len)
    {
    if (j != off && pattern[j] == '*' )
    off = j, j ++; //consistent with the above comment
    else if( j == off || str[i] == pattern[j] || pattern[j] == '.')
    j++, i++;
    else
    j = next[j];
    }
    if(j == pattern_len)
    return i;
    else
    return -1;
    }

    int main()
    {
    string s = "abdcxdefabc";

    cout<<kmp_ext(s, "ab..cx")<<endl;
    cout<<kmp_ext(s, "ab*x.*fa*c")<<endl;
    cout<<kmp_ext(s, "cxd*")<<endl;
    cout<<kmp_ext(s, "*b*x.*fa*c")<<endl;
    cout<<kmp_ext(".ab", ".a.")<<endl;
    cout<<kmp_ext("a*b", "a*b")<<endl;
    }

    VN:F [1.9.22_1171]
    0
  30. Profile photo of RyanRyan

    VN:F [1.9.22_1171]
    0
  31. peng

    Hi,

    Thanks for sharing code, I found hard on understanding following lines.

    while ((*p == *s) || (*p == ‘.’ && *s != ”)) {
    if (isMatch(s, p+2)) return true;
    s++;
    }

    If I have S = ‘abbbc’, P = ‘ab*ba’, when in the second loop, the line “if (isMatch(s, p+2)) return true;: will be true and return True to the main loop. Then the result will be true. Is my understanding correct?

    Thanks

    VA:F [1.9.22_1171]
    0
    1. newwayy

      regex has two different methods: search vs. match. In this question, match means from head to tail, not part of it. In regex,

      VA:F [1.9.22_1171]
      0
  32. eaglesky1990

    My DP solution. It takes O(N^2) time and O(N) space. Is this the best we can do?

    VA:F [1.9.22_1171]
    +2
  33. Pingback: [Leetcode] Regular Expression Matching | Yufei Zhao @ Columbia University

  34. 140820

    Thanks for sharing the code! It is really elegant.

    However I found it fails for the following cases:
    (“ab”, “.*c”)
    (“aaa”, “ab*a”)
    (“aaba”, “ab*a*c*a”)

    VA:F [1.9.22_1171]
    0
  35. Profile photo of qilu xieqilu xie

    C# solution as follows:

    VN:F [1.9.22_1171]
    0
  36. Luwei Cheng

    The sample code is neat, but it can be neater. I feel that it takes some effort to understand the “while” clause.. Let me share my simpler code:

    Key: at each recursion, advance at least one of the two strings.

    VA:F [1.9.22_1171]
    +1
  37. heng zhang

    starting from the end

    VA:F [1.9.22_1171]
    0
  38. heng zhang

    here is the correct copy.
    starting from the end

    VA:F [1.9.22_1171]
    0
  39. Pingback: Leetcode #10 Regular Expression Matching | Kris Ma

  40. 714coderK

    In the following paragraph, shouldn’t “smaller” actually be “larger”?
    ” One might be tempted to think of a quick workaround. How about counting the number of consecutive b’s in s? If it is smaller or equal to the number of consecutive b’s after “b*” in p, we conclude they both match and continue from there…”

    VA:F [1.9.22_1171]
    0
  41. lovingyun

    well, i can’t understand that when isMatch(s, p+2) will return true? for example, s=ab, p=a*b, the last step maybe isMatch(s,p+2), *s='b'; *p='b'; but where is the end of the recursion ?

    VA:F [1.9.22_1171]
    0
  42. bomberX

    This problem can be solved both in DP & Automata, Following is the C code using DP.
    Time: O(n * m), Space: O(n * m);

    VA:F [1.9.22_1171]
    0

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use the <code> tag to embed your code.