Implement regular expression matching with support for ‘.’ and ‘*’.

‘*’ Matches zero or more of the preceding element.

The matching should cover the **entire** input string (not partial).

The function prototype should be:

bool isMatch(const char *s, const char *p)

Some examples:

isMatch(“aa”,”a”) → false

isMatch(“aa”,”aa”) → true

isMatch(“aaa”,”aa”) → false

isMatch(“aa”, “a*”) → true

isMatch(“aa”, “.*”) → true

isMatch(“ab”, “.*”) → true

isMatch(“aab”, “c*a*b”) → true

**Online Judge**

This problem is available at Online Judge. Head over there and it will judge your solution. Currently only able to compile C++/Java code. If you are using other languages, you can still verify your solution by looking at the judge’s test cases and its expected output.

**Background:**

It might seem deceptively easy even you know the general idea, but programming it correctly with all the details require careful thought.

**Edit:**

It seems that some readers are confused about why the regex pattern “.*” matches the string “ab”. “.*” means repeat the *preceding* element 0 or more times. Here, the *“preceding”* element is the **dot** character in the *pattern*, which can match any characters. Therefore, the regex pattern “.*” allows the dot to be repeated any number of times, which matches any string (even an empty string).

**Hints:**

Think carefully how you would do matching of ‘*’. Please note that ‘*’ in regular expression is *different *from wildcard matching, as we match the previous character 0 or more times. But, how many times? If you are stuck, recursion is your friend.

This problem is a tricky one. Due to the huge number of edge cases, many people would write lengthy code and have numerous bugs on their first try. Try your best getting your code correct first, then refactor mercilessly to as clean and concise as possible!

A sample diagram of a deterministic finite state automata (DFA). DFAs are useful for doing lexical analysis and pattern matching. An example is UNIX’s grep tool. Please note that this post does not attempt to descibe a solution using DFA.

**Solution:**

This looks just like a straight forward string matching, isn’t it? Couldn’t we just match the pattern and the input string character by character? The question is, how to match a ‘*’?

A natural way is to use a greedy approach; that is, we attempt to match the previous character as many as we can. Does this work? Let us look at some examples.

*s* = “abbbc”, *p* = “ab*c”

Assume we have matched the first ‘a’ on both *s* and *p*. When we see “b*” in *p*, we skip all b’s in *s*. Since the last ‘c’ matches on both side, they both match.

*s* = “ac”, *p* = “ab*c”

After the first ‘a’, we see that there is no b’s to skip for “b*”. We match the last ‘c’ on both side and conclude that they both match.

It seems that being greedy is good. But how about this case?

*s* = “abbc”, *p* = “ab*bbc”

When we see “b*” in *p*, we would have skip all b’s in s. They both should match, but we have no more b’s to match. Therefore, the greedy approach fails in the above case.

One might be tempted to think of a quick workaround. How about counting the number of consecutive b’s in *s*? If it is smaller or equal to the number of consecutive b’s after “b*” in *p*, we conclude they both match and continue from there. For the opposite, we conclude there is not a match.

This seem to solve the above problem, but how about this case:

*s* = “abcbcd”, *p* = “a.*c.*d”

Here, “.*” in *p* means repeat ‘.’ 0 or more times. Since ‘.’ can match any character, it is not clear how many times ‘.’ should be repeated. Should the ‘c’ in *p* matches the first or second ‘c’ in *s*? Unfortunately, there is no way to tell without using some kind of exhaustive search.

We need some kind of backtracking mechanism such that when a matching fails, we return to the last successful matching state and attempt to match more characters in *s* with ‘*’. This approach leads naturally to recursion.

The recursion mainly breaks down elegantly to the following two cases:

- If the next character of
*p*is**NOT**‘*’, then it must match the current character of*s*. Continue pattern matching with the next character of both*s*and*p*. - If the next character of
*p*is ‘*’, then we do a brute force exhaustive matching of 0, 1, or more repeats of current character of*p*… Until we could not match any more characters.

You would need to consider the base case carefully too. That would be left as an exercise to the reader. 🙂

Below is the extremely concise code (Excluding comments and asserts, it’s about 10 lines of code).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
bool isMatch(const char *s, const char *p) { assert(s && p); if (*p == '\0') return *s == '\0'; // next char is not '*': must match current character if (*(p+1) != '*') { assert(*p != '*'); return ((*p == *s) || (*p == '.' && *s != '\0')) && isMatch(s+1, p+1); } // next char is '*' while ((*p == *s) || (*p == '.' && *s != '\0')) { if (isMatch(s, p+2)) return true; s++; } return isMatch(s, p+2); } |

**Further Thoughts:**

Some extra exercises to this problem:

- If you think carefully, you can exploit some cases that the above code runs in exponential complexity. Could you think of some examples? How would you make the above code more efficient?
- Try to implement partial matching instead of full matching. In addition, add ‘^’ and ‘$’ to the rule. ‘^’ matches the starting position within the string, while ‘$’ matches the ending position of the string.
- Try to implement wildcard matching where ‘*’ means any sequence of zero or more characters.

For the interested reader, real world regular expression matching (such as the grep tool) are usually implemented by applying formal language theory. To understand more about it, you may read this article.

Regular Expression Matching,
why isMatch(“aab”, “c*a*b”) -> true?

obviously, the second string contains one extra ‘c’ compared to the first string. Am i correct?

Thanks, great website:)

0isMatch(“aab”, “c*a*b”) returns true because c* means 0 or more ‘c’ to match. Since it doesn’t start with a ‘c’, so c* matches 0 ‘c’. Similarly, a* matches the two ‘a’s, and b matches the last ‘b’.

+14ic,lol

0That’s obviously wrong!

-9I think “c*” should be clarified in the question. People may think ‘* ‘in “c*” means it can be 0 or more previous char, which is ‘c’. But ‘c’ in “c*” already exists. So “c*” means at least one ‘c’.

I guess here ‘*’ can cancel the previous char, and thus “c*” means 0 or more ‘c’. But, I think this is not common sense.

+1I could be wrong. I think people should know RE.

+2c* means 0 or more. Its a conception in “computation theory”. Its a important course in computer science. And in compiler theory, this * conception is also mentioned very often. If you wanna describe 1 or more, normally, we use c+.

0Hi 337c0d3r,

Your code returns false when isMatch(“aabcde”, “c*a*b”) is given.

-2Hi 337c0d3r,

Your code returns false when isMatch(“aabcde”, “c*a*b*”) is given.

-1That is the expected answer. Instead try

ccccb, cab, cb, b, ccccaaaaaaaaa, or the empty string

+1because ‘*’ can match current pattern 0 time.

0I think it is wrong, or if the second has a * , the strings are match,

0Could you please tell me what’s the running time of this algorithm? O(n^2)?

0In case s=”a…..ab” p=”a*a*….a*”, it takes exponential time and O(n) space. Using memorizing, I think it takes O(n^3) time and O(n^2) space.

+4To improve the exponential time, is it feasible to preprocess the pattern string? if p=”a*a*….a*”, p1=”a*”, are p and p1 equivalent?

0KMP gives a running time of O(n* number of * characters)

+1The +1 button on my side has some problem

-25Could you tell why isMatch(“ab”, “.*”) → true? .can match “a”, so “.*” is a or aa or aa… How could it be “ab”?

+3‘.*’ means you can have more than 0 ‘.’ then it’s ‘..’ and ‘ab’

+3I still don’t think its correct.. because lets say we assume any char for ‘.’ and that can be occur 0 or more times.. so but still at the end there should be b.

if the condition was isMatch(“ab”, “.*b”) I can say its true. But how isMatch(“ab”, “.*”) can be true?…..

0how could isMatch(“ab”,”.*”) -> true ?? . matches a then preceding * can only match more a’s , how did this b came then ….!!

0It means 0 or multiple single character, which definitely match “ab”.

-2It should not be true. * means 0 or multiple “preceding” character. In this case it is ‘a’. So how did ‘b’ come from?

0The preceding character is ‘.’, which can stand for any character.

In a another word, “.*’ equals anything as long as there is no tail behind, like “.*c”. If there is a tail, in s we must have corresponding ‘c’

0So let say you assume any character for eg: ‘m’ for character ‘.’ now for ‘ * ‘ you can have 0 or multiple ‘m’. Then what ?.. but you cant have anything after m so this case will fail. so its false.

Now to make it true we have to assume a for ‘ . ‘ so we can say 0 or multiple a but at the end you should not have any character except ‘a’ according to pattern. so how can you consider b?

if the condition was isMatch(“ab”, “.*b”) I can say its true. But how isMatch(“ab”, “.*”) can be true?…..

0@rockstar and @ds:

I mentioned this in my post, “.*” means repeat the preceding element of the pattern itself, which is the single dot character. Since dot can match virtually any character, the pattern “.*’ will match the string “ab”.

If you do not believe me, I’ll quote this from here, http://www.regular-expressions.info/dot.html :

“Suppose you want to match a double-quoted string. Sounds easy. We can have any number of any character between the double quotes, so “.*” seems to do the trick just fine. The dot matches any character, and the star allows the dot to be repeated any number of times, including zero.”

+2dumb test case isMatch(“aa”, “*a”) will trigger assert, which should not.

-2from my understanding, the ‘*’ must have a preceding element, so your use case maybe wrong regarding to this scenario.

+2in other word, a wrapper function is needed.

0Please find the java version of the solution [Naive method]

+3Looks like your code is not working.

This returns false, but patten actually matches:

System.out.println(pm.regexMatch("abbbbc".toCharArray(), "ab*bbbbc".toCharArray()));

-1Here is my version based on author posted solution:

public boolean matches(String text, String pattern) {

// if pattern is empty - whole string should be empty to match

if (pattern.length() == 0) return text.length() == 0;

// retrieve 2nd character of the pattern

String nextChar = (pattern.length() > 1) ? String.valueOf(pattern.charAt(1)) : "";

char t = (text.length() > 0) ? text.charAt(0) : 0;

char p = pattern.charAt(0);

`// if 2nd char is not '*' - it means that we are checking if next char in text matches with next char in pattern`

// and recursively run the code starting from +1 char in both pattern/text

if (!"*".equals(nextChar)) {

return ((t == p) || (p == '.' && t != 0)) &&

matches(text.substring(1), pattern.substring(1));

} else {

while ((t == p) || (p == '.' && t != 0)) {

// check if current text matches tail part of the pattern (+2 symbols)

if (matches(text, pattern.substring(2))) return true;

// cut first symbol in original text and repeat the check in the loop

text = text.substring(1);

t = (text.length() > 0) ? text.charAt(0) : 0;

}

return matches(text, pattern.substring(2));

}

}

+1Thanks for the tips and article

0I come up with a dynamic programming solution. It takes O(n^2) in time and O(n) in space.

Could you please check if it is correct.

Here is the solution:

d[i,j] is true iff a[0..i] matches r[0..j]

d[i,j] is true if one of these cases is true:

(1) a[i] == r[j] and d[i-1,j-1] is true (match a character with a character)

(2) r[j] == ‘.’ and d[i-1,j-1] is true

(3) r[j] is * and it can match a[i] and d[i-1,j-1] is true (* consumes character as its first one)

(4) r[j] is * and it can match a[i] and d[i-1,j] is true (* consumes 1 more character)

(5) r[j] is * and it can match a[i] and d[i,j-1] is true (* matches an empty string)

The solution is called “Elastic Matching” used in Handwriting recognition.

Please see the full solution here: http://tanin.nanakorn.com/blog/!/id25

0Detail is a little off and wrong. Sorry about that. But I guess you can get the concept of how it works

0Could anyone give me a O(N) solution ?

0No.

-2NO

0NO

0Nice code!

I have a minor question.

IsMatch(“a”, “ab*a”) should be “ture”, but this code returns “false”.

Please correct me if I miss some point

0IsMatch(“a”, “ab*a”) should return “false”.

This is because the pattern “ab*a” means the string begin with an a, following by 0 or more b’s, then an another a at the end.

This imply that it must minimally contain two characters which has two a’s, ie. “aa”.

0Oh, I miss the whole match condition.

If the regular expression string should be equal to the entire input string, then IsMatch(“a”, “ab*a”)–>false

0Yup, that’s correct. It’s mentioned in the problem statement, Just bolded the word “entire” to make the statement clearer.

0The code is amazing.

I just don’t understand why we need:

assert(*p != ‘*’), which is after if (*(p+1) != ‘*’)

Thanks,

0This is to assume that multiple *’s are not in the input pattern and before a * there must be a letter to match, or else the code above will not be correct. In the case where multiple *’s can be in the input pattern, you can just use a loop to skip those, as those are redundant and won’t affect the correctness.

0There is a bug.

char* exp = “b*bb”, char* text = “bb”

This should be a “Not match”, but your code will answer “match”.

You should change your last code to

-1Thanks for your comment.

exp=”b*bb” and text=”bb” should be a match. This is because “b*” could match 0 or more b’s. In this case it would be 0 b.

0I think my code is same with answer but it shows the backtracking aspect of problem more.

int match(char *s, char *p)

{

int char_matched = 0;

if (*s == ”)

return (*p == ”);

if (*s == *p || *p == ‘.’)

char_matched = 1;

if (*(p+1) == ‘*’)

return char_matched? (match(s+1, p) || match(s+1, p+2)) : match(s+1, p+2);

else

return char_matched? match(s+1, p+1): 0;

}

0I just saw “bb”, “b*bb” case above. Yeah.. this should match. Added the case.

int match(char *s, char *p)

{

int char_matched = 0;

if (*s == ”)

return (*p == ”);

if (*s == *p || *p == ‘.’)

char_matched = 1;

if (*(p+1) == ‘*’)

return char_matched? (match(s+1, p) || match(s+1, p+2) || match(s, p+2)) : match(s+1, p+2);

else

return char_matched? match(s+1, p+1): 0;

}

0This code is buggy. Here are some examples where it would fail:

– it would return false on empty string for regex “.*” while it should have returned true.

– it would fail in the case such as text=”ab”and regex=”ab.*”.

0the code doesn’t handle when the length of p is 1

0// next char is ‘*’

while ((*p == *s) || (*p == ‘.’ && *s != ”))

Hello, in this piece of code, *p == ‘.’ would never be true since there is no any code inside while loop operates p.

0Cannot understand the following code:

// next char is ‘*’

while ((*p == *s) || (*p == ‘.’ && *s != ”)) {

if (isMatch(s, p+2)) return true;

s++;

}

why use “if (isMatch(s, p+2)) return true;”? This will stop comparing s to the characters before ‘*’ in p. Is this correct?

0This code is breaking for abdabcabc & .*abc

0“the regex pattern “.*” allows the dot to be repeated any number of times, which matches any string (even an empty string)”.

s = abdabcabc; p = .*abc.

(1) .* will match the first six characters of s, i.e., abdabc (you could consider .* as “……” in this case);

(2) the last three characters of s and p match.

Therefore, .*abc matches abdabcabc.

0Python? DP can be applied (not in the following code).

Run:

./reg.py “abcbcd” “a.*c.*d”

+1I am not sure about the time complexity of solution.

Why do not solve this problem via DP? it would be O(n ^ 2) ~

0here’s my code:

it can pass all test cases.

+1Further thoughts:

Problem 1:

I came up one case that takes a long time to get the result while the length of string s and p are not huge.

s[] = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

p[] = "a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*b"

0Skip the redundant maches … for example a*a*a* can be reduced to a*

0I don’t know is this the case of exponentially complexity or not. Can anyone help me check it? Thanks.

0Pingback: regular expression matching support for ‘.’ and ‘*’ | Jisku.com - Developers Network

here, definition of ‘*’ is confusing, a*b could equal to following strings?

b–>cancel a

ab–>0 a

aab–>1 a

aaab–>2 a

aaaab–>3 a

…

0Knuth morris pratt modified, * is greedy. All we have to do is build patterns of all characters upto *

for .e.g ab*bc*cd, we build 3 prefix that are also suffixes pattern for ab , bc and cd respectively.

for the actual string, we first start with ab using its prefix array try to match the string, once a match occurs we recursively try to match from that matched position for bc and finally for cd. we backtrack when we complete the entire string for any recursion level, seems like O(n*number of stars) to me. I am in the process of writing code for it.

0My previous solution was for wildcard match but it lends intuitively into regex match also. Cancel kmp prefix generation for a character whose next char is a *, then the same algo holds but this time we need to move to the next matching block once the current * char is matched and backtrack here only when the next matching block fails.

0nice post, but has problem as isMatch(“aa”, “*a”) triggers assert which leads to termination of program.

this line is incorrect in the code, as it can lead to segmentation fault… how can we directly access p+2 element, we know for sure that p is not ”, but p+1 element can be ” , therefore leading to p+2 to be undefined.

0This branch will be executed while ” // next char is ‘*’ “. This means *(p + 1) = ‘*’.

Then *(p + 2) can be ”, the terminal of string.

0I think there is a bug for your code.

For the case where p = “a” and s = “b*a”, your code will return FALSE. However, I believe it should return TRUE.

What do you think? Did I miss anything?

0p is the regular expression.

0The following is the DP solution which passes all the online cases:

class Solution {

public:

bool isMatch(const char *s, const char *p) {

vector row_char, column_char;

vector column_st;

while(*s != ”)

row_char.push_back(*(s++));

while(*p != ”)

{

if(*p != ‘*’)

{

column_char.push_back(*p);

if(*(p + 1) == ‘*’)

column_st.push_back(true);

else

column_st.push_back(false);

}

p++;

}

int row_sz = row_char.size(), column_sz = column_char.size();

if(row_sz == 0 && column_sz == 0)

return true;

if(row_sz == 0)

{

for(int i = 0; i < column_sz; i++)

if(column_st[i] == false)

return false;

return true;

}

if(column_sz == 0)

return false;

bool state[2][column_sz];

int flag = false;

int i = 0;

for(; i < column_sz; i++)

{

if(column_char[i] == '.' || column_char[i] == row_char[0])

flag = true;

if(column_st[i] == true)

state[0][i] = true;

else

break;

}

if(i == column_sz && flag == false)

return false;

if(i < column_sz)

{

if(column_char[i] == '.' || column_char[i] == row_char[0])

state[0][i] = true;

else

state[0][i] = false;

i++;

}

for(; i < column_sz; i++)

{

if(column_st[i] == true && state[0][i – 1] == true)

state[0][i] = true;

else

state[0][i] = false;

}

int cur = 1, pre = 0;

for(i = 1; i < row_sz; i++)

{

if(column_st[0] == true && column_char[0] == '.')

state[cur][0] = true;

else if(column_st[0] == true && state[pre][0] == true && row_char[i – 1] == column_char[0] && row_char[i] == column_char[0])

state[cur][0] = true;

else

state[cur][0] = false;

for(int j = 1; j < column_sz; j++)

{

if(column_st[j] == true && state[cur][j – 1] == true)

state[cur][j] = true;

else if(column_char[j] == '.' || row_char[i] == column_char[j])

{

if(state[pre][j – 1] == true)

state[cur][j] = true;

else if(column_st[j] == true && state[pre][j] == true)

state[cur][j] = true;

else

state[cur][j] = false;

}

else

state[cur][j] = false;

}

swap(cur, pre);

}

return state[pre][column_sz – 1];

}

};

0Here is the ultimate in conciseness (short of using a regex library): 6 lines of Prolog!

Note that 42 is the ASCII for ‘*’ and 46 the ASCII for ‘.’. A sample query is this:

0+2This seems neat and passes all OJ tests.

But it has bugs and fails on this simple example:

s* = “a”

p* = “b*a”

0sorry I think I misunderstood what * means, it could cancel the preceding character.

+1isMatch(s, p+2) maybe out of boundary, in which case will store unexpected characters. Better check before go into.

*(p+1) != ” && isMatch(s, p+2)

+10your code is giving not match for isMatch(“.”,”a”) but it should be a match.

0you made a mistake here.’a’ can be a match to ‘.’ but not the opposite.

0Use DP, Time Complexity is O(N^2), and I only store the last and current row of the table, so the Space Complexity is O(N). Here is the working code.

0html problem. ' should be ‘ instead.

0input abaaabc

pattern .*bc

0Sorry. should be

assert(s != NULL && p != NULL && *p != ‘*’);

0isMatch(“ab”, “.*”) → true

but why “ab”, “.*c” expect false?

0I think the posted sloution will fail for the following test case:

s=”xabbb” p=”x.*”

for .* first it will check for a and it will match…. but the rest of three b’s will also match with the same .*. so it will return TURE but it should return FALSE.

ifff s=”xaaaa” p=”x.*” then it should return TRUE. plz lemmy know if im right or wrong.

0Pingback: leetcode: Regular Expression Matching solution | 烟客旅人 sigmainfy

Here is my solution,

int Findstr(char *string, char *str, int len)

{

assert(string != NULL && str !=NULL && len>0);

for(int i=0; i<strlen(string); i++)

{

int k = i, j;

for(j=0; j<len && (str[j] == '?' || string[k] == str[j]); j++,k++);

if(j==len) return i;

}

return -1;

}

int SplitString(char *&str, char c)

{

assert(str!=NULL); //?? temp!=NULL

int cnt=0;

int len = strlen(str);

for(int i=0; i<len; i++)

{

if(str[i]!=c && (i+1== len || str[i+1] == c)) //str[i] ==c && (i+1 == len || str[i+1] != c)

{

str = str+i-cnt;

return cnt+1;

}

else if(str[i]!=c)

cnt++;

}

return -1;

}

bool matchpattern(char *text, char *pattern)

{

assert(text!=NULL && pattern!=NULL);

char *strP = pattern;

char *strT = text;

int len,index=0;

while((len = SplitString(strP, '*')) != -1

&& (index = Findstr(strT, strP, len)) != -1)

{

strP += len;

strT += index+len;

}

if((len == -1) && (pattern[strlen(pattern)-1] == '*' || *strT == NULL)) return true;

return false;

}

0I used O(n) method. I just compare non star character with string.

I believe this is one of the best solution

0could you give your idea?

0DP solution, time complexity n^2

public boolean isMatch(String s, String p) {

if (s == null || p == null) return false;

int sl = s.length();

int pl = p.length();

boolean a[][] = new boolean[pl+1][sl+1];

a[0][0] = true;

for(int i=1; i<=sl;i++){

a[0][i] = false;

}

for(int i=1; i 1) {

a[i][0] = a[i-2][0];

} else {

a[i][0] = false;

}

}

for(int i=1; i <= pl; i++) {

for(int j=1; j <= sl; j++) {

if(p.charAt(i-1) == '*' && i != 1) {

a[i][j] = a[i-2][j] || a[i-1][j] ||

(a[i][j-1] && (p.charAt(i-2) == '.' || p.charAt(i-2) == s.charAt(j-1)));

} else {

a[i][j] = a[i-1][j-1]

&& (p.charAt(i-1) == '.' || p.charAt(i-1) == s.charAt(j-1));

}

}

}

return a[pl][sl];

}

0DP solution, time complexity n^2

0For this case: “”, “.*”, even I wrote: (in the first line, it shows false)

if(s==””&&p==”.*”)

return true;

Why???????????????????????????????????

0s and p are pointers

0I once extended the classical KMP algorithm to make it able to handle wild card in pattern matching. The complexity stays O(N) time and O(N) space. This is the code, but it is extremely hard to understand: 10 times harder than the classical KMP. I don’t want to put any explanation here because a detailed explanation may cost a paper. If someone is really interested in it, email me: the.easiest.name.to.remember@gmail.com

vector next_ext(const string &str)

{

int leng = str.size();

if (leng < 0 || str.empty())

return vector();

vector next(leng + 1);

int i = -1, j, off; //treated like str[-1] is a ‘*’

off = i, j = off, i ++, next[i] = off;

while (i < leng)

{

if (str[i] == '*' ) //consistent with the above comment.

off = i, j = off, i ++, next[i] = off;

else if (j == off || str[i] == str[j] || str[j] == '.')

++i, ++j, next[i] = j;

else

j = next[j];

}

return next;

}

//if matched, return the last position of the first matched substring of the string, else return -1.

int kmp_ext(string str, string pattern)

{

int str_len = str.size(), pattern_len = pattern.size();

vector next = next_ext(pattern);

int i = 0, j = -1, off; // treated like pattern[-1] is a ‘*’

off = j, j ++;

while(i < str_len && j < pattern_len)

{

if (j != off && pattern[j] == '*' )

off = j, j ++; //consistent with the above comment

else if( j == off || str[i] == pattern[j] || pattern[j] == '.')

j++, i++;

else

j = next[j];

}

if(j == pattern_len)

return i;

else

return -1;

}

int main()

{

string s = "abdcxdefabc";

cout<<kmp_ext(s, "ab..cx")<<endl;

cout<<kmp_ext(s, "ab*x.*fa*c")<<endl;

cout<<kmp_ext(s, "cxd*")<<endl;

cout<<kmp_ext(s, "*b*x.*fa*c")<<endl;

cout<<kmp_ext(".ab", ".a.")<<endl;

cout<<kmp_ext("a*b", "a*b")<<endl;

}

00Such an amazing code! I even spent half a day to understand the code!

0if (*(p+1) != ‘*’) may overflow

0If ab matches “.*”

Does that mean that .* will match everything after a .* appeared in input?

+1Hi,

Thanks for sharing code, I found hard on understanding following lines.

while ((*p == *s) || (*p == ‘.’ && *s != ”)) {

if (isMatch(s, p+2)) return true;

s++;

}

If I have S = ‘abbbc’, P = ‘ab*ba’, when in the second loop, the line “if (isMatch(s, p+2)) return true;: will be true and return True to the main loop. Then the result will be true. Is my understanding correct?

Thanks

0WHY?! glibc POSIX regex hold the view that “aa” matchs “a” https://github.com/xiangzhai/leetcode/blob/master/src/regex/regex.c#L51

0regex has two different methods: search vs. match. In this question, match means from head to tail, not part of it. In regex,

0My DP solution. It takes O(N^2) time and O(N) space. Is this the best we can do?

+3Your solution is efficient and neat.

0Pingback: [Leetcode] Regular Expression Matching | Yufei Zhao @ Columbia University

Thanks for sharing the code! It is really elegant.

However I found it fails for the following cases:

(“ab”, “.*c”)

(“aaa”, “ab*a”)

(“aaba”, “ab*a*c*a”)

0“aaa” and “ab*a” are not matched, the result is supposed to be false.

0C# solution as follows:

0so input “” , “c*c*” expects false while input “” , “.*” expects true ?

0nvm, i was wrong

0The sample code is neat, but it can be neater. I feel that it takes some effort to understand the “while” clause.. Let me share my simpler code:

Key: at each recursion, advance at least one of the two strings.

+1starting from the end

0here is the correct copy.

starting from the end

0Pingback: Leetcode #10 Regular Expression Matching | Kris Ma

In the following paragraph, shouldn’t “smaller” actually be “larger”?

” One might be tempted to think of a quick workaround. How about counting the number of consecutive b’s in s? If it is smaller or equal to the number of consecutive b’s after “b*” in p, we conclude they both match and continue from there…”

0well, i can’t understand that when

`isMatch(s, p+2)`

`will return true? for example, s=ab, p=a*b, the last step maybe isMatch(s,p+2), *s='b'; *p='b'; but where is the end of the recursion ?`

0This problem can be solved both in DP & Automata, Following is the C code using DP.

Time: O(n * m), Space: O(n * m);

0`js`

function isMatch(string, pattern) {

if (pattern === '') {

return string === ''

}

// pattern 下一个字符不是 * , 必须匹配当前字符

if (pattern.charAt(1) !== '*') {

return (

pattern.charAt(0) === string.charAt(0) ||

(pattern.charAt(0) === '.' && string.charAt(0) !== '')

) && isMatch(string.slice(1), pattern.slice(1))

}

// pattern 下一个字符是 '*'

while (

(pattern.charAt(0) === string.charAt(0)) ||

(pattern.charAt(0) === '.' && string.charAt(0) !== '')

) {

if (isMatch(string, pattern.slice(2))) {

return true

}

string = string.slice(1)

}

return isMatch(string, pattern.slice(2))

}

00Pingback: LeetCode 10: Regular Expression Matching (Hard) | helloyuan

For the further thoughts question 1, is there some hint about how to make the algorithm more efficient? Is the a*a*b kind of pattern actually going to cause exponential time complexity?

0My Java recursion solution based on the author’s article. I am using arrays and pointers to avoid using substrings. Passes all tests in OJ.

0Pingback: [LeetCode]Regular Expression Matching

That’s an interesting challenge. I’ve tried to implement it myself with addition of alternatives (“|”) and grouping (“(“, “)”), and that’s when it gets hairy. I wonder if it’s possible to do it as concise.

0“.*” means repeat “.” 0 or multiple times. “.*” could be “..” “…” “….” “…..” “……”, ….

Hence it will actually match any string.

0