Finding the Minimum Window in S which Contains All Elements from T

Given a set T of characters and a string S, find the minimum window in S which will contain all the characters in T in complexity O(n).

eg,
S = “ADOBECODEBANC”
T = “ABC”

Minimum window is “BANC”.

This is an interesting problem and interesting problems have multiple approaches and the best approach is usually simple and beautiful.

In this post, I illustrate my approach when I first attempt this problem. My first approach is more complicated and is not the best solution (runs in O(N lg M) time). Later in this post, I explain the best approach which runs in O(N) time with the help of images.

Hint:
Using the above example S = “ADOBECODEBANC” and T = “ABC“, we can easily find the first window which contains T is “ADOBEC“. Another possible candidate is “ADOBECODEBA“. In fact, we should skip this, because within it exists a sub-window “CODEBA” that is both shorter and satisfy the constraint. The final window to consider is “BANC“, which is also the minimum window.

To solve this problem efficiently, below are the two key points we need to consider:

  • How do we determine if a particular window contains T? (ideally in O(1) time)
  • How do we select all windows efficiently? (ideally do not include other windows that wrap about a sub-window)

We definitely need the help from a hash table here. Hash table is able to tell us if a letter exist in T in O(1) time.

An O(N lg M) solution:
When I first approach this problem, I thought of having another table which records the position of last-met character in T. That is, when I first see ‘A‘, I record its position being 0. Each time I see ‘A‘ again, I replace its position with the new position. This approach is simple but flawed. Notice that T does not include duplicate characters? If T includes duplicate characters, such as “AABC“, this approach does not work.

In this case, the remedy is to maintain queues (instead of table) that correspond to each unique character in T. For example, assume that T = “AABC“, when you first see ‘A‘, push its position into the ‘A‘ queue (which originally is empty). When you see ‘A‘ again, push its position to the back of ‘A‘ queue. The third time you see ‘A‘, you would pop the front element, and push its position to the back of ‘A‘ queue. By popping the element, we do not include window that wrap around a sub-window. This approach works but the difficulty is two-fold:

  • There is no way we could determine the starting and ending position of a window directly from the queue itself. A naive way is to scan the entire queue for minimum and maximum value.
  • How do we determine if the window satisfies the constraint? We would have to scan all queues to check if the sum of all queues’ sizes is equal to T‘s length.

The way I solve the above problem is to maintain a sorted map that maps indices to character so that we can grab the minimum and maximum position in O(1) time. But there is an extra cost for doing this. Each time you pop an element from the queue, you would also need to update the map by deleting the corresponding element and inserting a new element. To check if the window satisfies the constraint, we check the map’s size; a valid window is found if the map’s size is equal to T’s length.

The complexity of the method is O(N lg M), where N is the length of S, and M is the length of T. The extra lg M term is due to the extra cost of deleting and inserting an element in the map, where each costs O(lg M) time in the worst case (Note that M is the map’s maximum size. Why?)

Best solution:
The best solution, is in fact simpler. This best approach is suggested by someone who called stormrage .

Notice how complicated the above solution is. It uses a hash table, a queue, and a sorted map. During an interview, the problems tend to be short and the solution usually can be coded in about 50 lines of code. So be sure that you say out loud what you are thinking and keep communication opened with the interviewer. Check if your approach is unnecessary complex, he/she might be able to give you guidance. The last thing you want to do is to get stuck in a corner and keep silent.

To help illustrate this approach, I use a different example: S = “acbbaca” and T = “aba“. The idea is mainly based on the help of two pointers (begin and end position of the window) and two tables (needToFind and hasFound) while traversing S. needToFind stores the total count of a character in T and hasFound stores the total count of a character met so far. We also use a count variable to store the total characters in T that’s met so far (not counting characters where hasFound[x] exceeds needToFind[x]). When count equals T‘s length, we know a valid window is found.

Each time we advance the end pointer (pointing to an element x), we increment hasFound[x] by one. We also increment count by one if hasFound[x] is less than or equal to needToFind[x]. Why? When the constraint is met (that is, count equals to T‘s size), we immediately advance begin pointer as far right as possible while maintaining the constraint.

How do we check if it is maintaining the constraint? Assume that begin points to an element x, we check if hasFound[x] is greater than needToFind[x]. If it is, we can decrement hasFound[x] by one and advancing begin pointer without breaking the constraint. On the other hand, if it is not, we stop immediately as advancing begin pointer breaks the window constraint.

Finally, we check if the minimum window length is less than the current minimum. Update the current minimum if a new minimum is found.

Essentially, the algorithm finds the first window that satisfies the constraint, then continue maintaining the constraint throughout.

i) S = “acbbaca” and T = “aba“.

ii) The first minimum window is found. Notice that we cannot advance begin pointer as hasFound[‘a’] == needToFind[‘a’] == 2. Advancing would mean breaking the constraint.

iii) The second window is found. begin pointer still points to the first element ‘a’. hasFound[‘a’] (3) is greater than needToFind[‘a’] (2). We decrement hasFound[‘a’] by one and advance begin pointer to the right.

iv) We skip ‘c’ since it is not found in T. Begin pointer now points to ‘b’. hasFound[‘b’] (2) is greater than needToFind[‘b’] (1). We decrement hasFound[‘b’] by one and advance begin pointer to the right.

v) Begin pointer now points to the next ‘b’. hasFound[‘b’] (1) is equal to needToFind[‘b’] (1). We stop immediately and this is our newly found minimum window.

Both the begin and end pointers can advance at most N steps (where N is S‘s size) in the worst case, adding to a total of 2N times. Therefore, the run time complexity must be in O(N).

Test Cases:
» Download sample test cases with answers

VN:F [1.9.22_1171]
Rating: 4.8/5 (109 votes cast)
Finding the Minimum Window in S which Contains All Elements from T, 4.8 out of 5 based on 109 ratings

74 thoughts on “Finding the Minimum Window in S which Contains All Elements from T

  1. Anonymous

    Very good blog, really loved it 🙂

    Can you please help me in solving a problem. How to find the maximum BINARY SEARCH TREE in a given binary tree. maximum in the sense of number of nodes.

    VA:F [1.9.22_1171]
    +2
    1. Anon

      Just an idea..
      Leaf nodes are always BST’s
      Call interval from root of tree
      for leaf return true and also the maxv=minv=leafvalue;
      for non leaf node check whether maxv of left current value,return left’s minv as min and right’s maxv.
      for keeping track of largest tree you can also return the size of left and right BST’s and maintain a maximum

      VA:F [1.9.22_1171]
      -1
    2. Neha

      Do postorder every node will return { size , is_bst, min_value, max_value} to it parent node , then in parent node check the constraint if root->data > min && root->data < max, is_bst=1, size= leftinfo.size + rightinfo.size +1,
      else if constarint is not satisfy return : size = max ( leftinfo.size , rightinfo.size) , min= INT_MAX , max = IN_MIN. is _bst = false.

      VA:F [1.9.22_1171]
      0
  2. airfang

    Here’s an alternative solution, it might not perform better (and has potential limitations) but it might give the interviewer a “wow” factor:

    Assume we have a Prime table that for each character we have a unique prime number, cross fingers and hope the product won’t overflow an unsigned long long 😀

    VA:F [1.9.22_1171]
    +3
      1. Masuzu

        Huh what the hell? That won’t work at all. This solution is based on the uniqueness of the decomposition of a number into a product of prime numbers.

        VA:F [1.9.22_1171]
        +1
  3. Kshitij

    Seems we also need to decrement the count variable in the if block

    if (hasFound[S[begin]] > needToFind[S[begin]])

    else, we would never be able to trace any window after the first one. right?

    VA:F [1.9.22_1171]
    +1
    1. Dilip

      Not needed.. Because, the value of the count is incremented only when hasFound[x] <= needToFind[x]

      VA:F [1.9.22_1171]
      0
    2. another_coder

      Yes Kshitij, I think you r right, once the count reaches tlen, it will never enter the if block again

      VA:F [1.9.22_1171]
      0
    3. Wonderer

      Not needed. The ‘count’ variable is used only to find the first satisfied window. Once found, its value will be tLen and never change.

      I think the code has a bug which makes you confused. It should be

      VA:F [1.9.22_1171]
      0
      1. Brady Fang

        No. The original code is right. It has to be

        If you change it to

        Any newly found character will not be counted, if T has no duplicate characters.

        VA:F [1.9.22_1171]
        0
        1. manish

          Buddy…if the count is increamented evry time a new char is seen which is not in T the what is the point of having a constraint tlen

          VA:F [1.9.22_1171]
          0
    1. another_coder

      Yes Kshitij, I think you r right, once the count reaches tlen, it will never enter the if block again!

      VA:F [1.9.22_1171]
      0
      1. tipico

        what if the size of T is n….then finding whether a character exists in T would be n time itself…and we need to do it for all characters in S…search alone leads to O(n2) time…can anyone explain if i am wrong….

        VA:F [1.9.22_1171]
        0
  4. Manish Kumar

    Hey guys please check this out:

    //——————————————————————————————————————–
    bool IsInSet(char ch, char* cSet)
    {
    char* cSetptr = cSet;
    int index = 0;
    while (*(cSet+ index) != ”)
    {
    if(ch == *(cSet+ index))
    {
    return true;
    }
    ++index;
    }
    return false;
    }

    void removeChar(char ch, char* cSet)
    {
    bool bShift = false;
    int index = 0;
    while (*(cSet + index) != ”)
    {
    if( (ch == *(cSet + index)) || bShift)
    {
    *(cSet + index) = *(cSet + index + 1);
    bShift = true;
    }
    ++index;
    }
    }
    typedef struct subStr
    {
    short iStart;
    short iEnd;
    short szStr;
    }ss;

    char* subStringSmallest(char* testStr, char* cSet)
    {
    char* subString = NULL;
    int iSzSet = strlen(cSet) + 1;
    int iSzString = strlen(testStr)+ 1;
    char* cSetBackUp = new char[iSzSet];
    memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);

    int iStartIndx = -1;
    int iEndIndx = -1;
    int iIndexStartNext = -1;

    std::vector subStrVec;
    int index = 0;

    while( *(testStr+index) != ” )
    {
    if (IsInSet(*(testStr+index), cSetBackUp))
    {
    removeChar(*(testStr+index), cSetBackUp);

    if(iStartIndx < 0)
    {
    iStartIndx = index;
    }
    else if( iIndexStartNext < 0)
    iIndexStartNext = index;
    else
    ;

    if (strlen(cSetBackUp) == 0 )
    {
    iEndIndx = index;
    if( iIndexStartNext == -1)
    break;
    else
    {
    index = iIndexStartNext;
    ss stemp = {iStartIndx, iEndIndx, (iEndIndx-iStartIndx + 1)};
    subStrVec.push_back(stemp);
    iStartIndx = iEndIndx = iIndexStartNext = -1;
    memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);
    continue;
    }
    }
    }
    else
    {
    if (IsInSet(*(testStr+index), cSet))
    {
    if(iIndexStartNext < 0)
    iIndexStartNext = index;
    }
    }

    ++index;
    }

    int indexSmallest = 0;
    for(int indexVec = 0; indexVec subStrVec[indexVec].szStr)
    indexSmallest = indexVec;
    }

    subString = new char[(subStrVec[indexSmallest].szStr) + 1];
    memcpy((void*)subString, (void*)(testStr+ subStrVec[indexSmallest].iStart), subStrVec[indexSmallest].szStr + 1);

    delete[] cSetBackUp;
    return subString;
    }

    int main( int argc, char* argv[] )
    {
    char set[] = {‘A’, ‘B’, ‘C’, ”};
    char teststr[] = “ADOBECODEBANC”;

    char* sRet = subStringSmallest(teststr, set);
    cout << sRet << endl;

    delete[] sRet;

    return 0;
    }

    //————————————————————————————————————————-

    VA:F [1.9.22_1171]
    0
  5. Manish Kumar

    Hi Pals,

    I have slightly modified the method “subStringSmallest”: Please plan to use it

    //—————————————————————————————————————
    char* subStringSmallest(char* testStr, char* cSet)
    {
    char* subString = NULL;
    int iSzSet = strlen(cSet) + 1;
    int iSzString = strlen(testStr)+ 1;
    char* cSetBackUp = new char[iSzSet];
    memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);

    int iStartIndx = -1;
    int iEndIndx = -1;
    int iIndexStartNext = -1;

    std::vector subStrVec;
    int index = 0;

    while( *(testStr+index) != ” )
    {
    if (IsInSet(*(testStr+index), cSetBackUp))
    {
    removeChar(*(testStr+index), cSetBackUp);

    if(iStartIndx < 0)
    {
    iStartIndx = index;
    }
    else if( iIndexStartNext < 0)
    iIndexStartNext = index;
    else
    ;

    if (strlen(cSetBackUp) == 0 )
    {
    iEndIndx = index;
    if( iIndexStartNext == -1)
    break;
    else
    {
    index = iIndexStartNext;
    ss stemp = {iStartIndx, iEndIndx, (iEndIndx-iStartIndx + 1)};
    subStrVec.push_back(stemp);
    iStartIndx = iEndIndx = iIndexStartNext = -1;
    memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);
    continue;
    }
    }
    }
    else
    {
    if (IsInSet(*(testStr+index), cSet))
    {
    if(iIndexStartNext < 0)
    iIndexStartNext = index;
    }
    }

    ++index;
    }

    int indexSmallest = 0;
    for(int indexVec = 0; indexVec subStrVec[indexVec].szStr)
    indexSmallest = indexVec;
    }

    subString = new char[(subStrVec[indexSmallest].szStr) + 1];
    memcpy((void*)subString, (void*)(testStr+ subStrVec[indexSmallest].iStart), subStrVec[indexSmallest].szStr);
    memset((void*)(subString + subStrVec[indexSmallest].szStr), 0, 1);

    delete[] cSetBackUp;
    return subString;
    }
    //——————————————————————————————————————-

    VA:F [1.9.22_1171]
    0
  6. raullen

    Interesting! Thanks for posting. Actually, the O(N) algorithm could be named as Earthworm-Moving algorithm (if you have ever see how a earthworm moves — stretch and contract). lol. The solution is the minimum size of a earthworm during the contract stage.

    Besides, I was thinking it is possible to use only one table, i.e., hasfound, instead of two. During the initialization stage, put hasfound[‘a’] = 2 and hasfound[‘b’] = 1 (and other entries zero). later, ++ (stretching brings in new ‘a’s) or — (‘a’ is thrown during contract ) these two entries and compare the resulting values with zero to see we could further decrease the window size.

    VA:F [1.9.22_1171]
    +1
  7. beyondfalcon

    in your naiive solution (O(NlogM) one), don’t you think it necessary to get minimum inside the loop?

    I think when the loop is done, the 1st index and last index in map m already holds the beginning and end indices in S to get min window

    VA:F [1.9.22_1171]
    0
  8. KC

    hi 1337,
    I tried with some test cases and It seems to work:
    I tried it in c# ..please have a glance and comment on it:

    public static void MinWindow( string s1,string s2)
    {
    char[] c1 = s1.ToCharArray();
    char[] c2 = s2.ToCharArray();
    int k = 0,st = 0,ed = 0,mSize = c1.Length,posS = 0,posE =0;
    HashSet h = new HashSet(c2);
    for (int i = st; i < c1.Length; i++)
    {
    if ((h.Contains(c1[i])) && (k0) &&(ed – st 0))
    {
    k = 0; ed = 0;
    }
    }
    Console.WriteLine(“The Output:”);
    for (int i = posS; i <= posE; i++)
    {
    Console.Write(s1[i]);
    }

    }

    VA:F [1.9.22_1171]
    0
    1. KC

      The equality and other went off while pasting….am sorry about that….let me try again…
      public static void MinWindow( string s1,string s2)
      {
      char[] c1 = s1.ToCharArray();
      char[] c2 = s2.ToCharArray();
      int k = 0,st = 0,ed = 0,mSize = c1.Length,posS = 0,posE =0;
      HashSet h = new HashSet(c2);
      for (int i = st; i < c1.Length; i++)
      {
      if ((h.Contains(c1[i])) && (k0) &&(ed – st 0))
      {
      k = 0; ed = 0;
      }
      }
      Console.WriteLine(“The Output:”);
      for (int i = posS; i <= posE; i++)
      {
      Console.Write(s1[i]);
      }

      }

      VA:F [1.9.22_1171]
      0
  9. Karthick

    A lucid explanation. 🙂 Thanks.
    Well, in this question, if the constraint that all the characters in the string ‘t’ must be present in the same order in the smallest sub-string of string ‘s’ (ie. finding the smallest substring of ‘s’ which contains ‘t’ as its sub-sequence), then is it possible to find the answer in linear time?

    VN:F [1.9.22_1171]
    +2
  10. Yang

    How about preprocess the string and come up with lists of the positions for the pattern to be matched.

    For the given example, preprocess S = “ADOBECODEBANC” and T = “ABC”
    to get:
    A: 0, 10
    B: 3, 9
    C: 5, 12

    and then simply find the smallest distance that contains all three letters, in this case from 9-12, which is BANC in the original String.

    Both space and time complexity is O(n), as we need to traverse the list twice effectively.

    VA:F [1.9.22_1171]
    0
    1. START_TREK

      @yang…..1nce…….(count==tlength) is satisfied…the minimum window having all the characters of T upto the present end element of String S is found….

      Then we can print the S array from begin to end……

      in such a way all d strings will b printed havng all the charecters f T…..

      VA:F [1.9.22_1171]
      0
  11. Xing Kang

    I just came across this problem and wondered that why we have to worry about duplicates in T since it has mentioned that T is a SET which shouldn’t contain any duplicate elements.

    VA:F [1.9.22_1171]
    0
  12. Pingback: G Phone interview « Shangxiazuoyou's Blog

  13. Joshua

    1. for each char in str2, find its occurrences in str1. for instance, ‘a’ appears in 0, 1, 3; ‘b’ appears in 2, 4, etc

    2. the question than becomes find min distances among sorted arrays

    3. always move the min index forward

    should complete in O(n)

    VN:F [1.9.22_1171]
    0
  14. tipico

    what if the size of T is n….then finding whether a character exists in T would be n time itself…and we need to do it for all characters in S…search alone leads to O(n2) time…can anyone explain if i am wrong….

    VA:F [1.9.22_1171]
    0
  15. Moussa

    very nice solution.
    Two minor issues: 1) using map (or hash table in java) would be more efficient than using “needToFind[256]”; 2) having both “needToFind” and “haveFound” is not necessary. We can create a map and initialize the map with negative values, something like this:

    std::map Count;
    for(int i=0; i<tLen; i++) Count[T[i]]–;

    VA:F [1.9.22_1171]
    0
    1. Moussa

      very nice solution.
      Two minor issues: 1) using map (or hash table in java) would be more efficient than using “needToFind[256]“; 2) having both “needToFind” and “haveFound” is not necessary. We can create a map and initialize the map with negative values, something like this:

      std::map Count;
      for(int i=0; i<tLen; i++) Count[T[i]]–;

      VA:F [1.9.22_1171]
      0
  16. Pingback: (Minimum Window Substring) | Runhe Tian Coding Practice

  17. Pingback: leetcode: Minimum Window Substring solution | 烟客旅人 sigmainfy

  18. saurabh

    what we have to do if we have to find minimum window for substring but we can not change the order…plz mail me asap

    VA:F [1.9.22_1171]
    0
  19. Albert

    similar with 1377’s first solution.


    class Solution {
    public:
    string minWindow(string S, string T) {
    if (S.empty() || T.empty()) {
    return "";
    }

    unordered_set tfs(T.begin(), T.end());

    typedef unordered_map<char, deque > CIQMap;

    CIMap chCntMap;
    for (int i=0; i<T.length(); ++i) {
    if (chCntMap.find(T[i]) == chCntMap.end()) {
    chCntMap.insert(make_pair(T[i], 1));
    }
    else {
    ++chCntMap[T[i]];
    }
    }

    CIQMap ciqm;
    set ids;

    int l=0, r=0, ssLen=INT_MAX;
    for (int i=0; isecond.empty());
    if (it->second.size() == chCntMap[S[i]]) {
    ids.erase(it->second.front());
    it->second.pop_front();
    }
    it->second.push_back(i);
    }
    else {
    ciqm.insert(make_pair(S[i], deque(1, i)));
    }

    ids.insert(i);

    int cl = *ids.begin();
    if (ids.size()==T.length() && (i-cl+1)<ssLen) {
    l = cl;
    r = i;
    ssLen = i-cl+1;
    }
    }
    }

    if (ssLen < INT_MAX) {
    return S.substr(l, ssLen);
    }
    else {
    return "";
    }
    }
    };

    VA:F [1.9.22_1171]
    0
  20. Albert

    similar with 1377′s first solution.

    sorry for the code on last post.

    VA:F [1.9.22_1171]
    0
  21. Pingback: Minimum Window Substring | ZhongYin Zhang

  22. Pingback: (Leetcode) Minimum Window Substring | Changhaz's Codeplay

  23. Gautham Kollu

    The first solution can also be made to run in O(n) by maintaining a extra linked list that keeps track of the window that is being used (instead of the map we use the linked list ). When the list get outdated we remove certain parts by comparing whether they part of the current window.
    This is not better than the second solution but they are some elements being inserting and deleting which is unnecessary, nevertheless this is still an O(n) algorithm.

    VA:F [1.9.22_1171]
    0
    1. Gautham Kollu

      Please check if stated every thing is right because I am getting a “Time limit exceeded” error.

      VA:F [1.9.22_1171]
      0
  24. Pingback: [LeetCode] Minimum Window Substring (Java) | Life In Code

  25. Atanu

    Very good explanation by stormrage to solve the problem in O(n) complexity..Just loved the coding style….:D

    VA:F [1.9.22_1171]
    0
  26. Hongbin

    OMG, your explanation is really awesome. I have seen some blogs about this problem, however, they are shit. I understand it immediately when I read your article. Thanks a lot!

    VA:F [1.9.22_1171]
    -1
  27. Pingback: LeetCode | Technical interview solutions

  28. Navneet kumar

    Java solution:

    VA:F [1.9.22_1171]
    0
  29. kevin

    Wont your solution break down if the input is

    abacbbaca

    and the word your looking for is ‘aba’
    It looks like your solution will return ‘baca’ when the actual solution is ‘aba’. Or am i misunderstanding something here

    VA:F [1.9.22_1171]
    0
  30. Pingback: [算法系列之二十二]包含T全部元素的最小子窗口 – 剑客|关注科技互联网

  31. Pingback: Mininum window substring | QiuQ

  32. Pingback: LeetCode article: Finding the Minimum Window in S which Contains All Elements from T | juliachenonsoftware

  33. Di

    Hint:
    Using the above example S = “ADOBECODEBANC” and T = “ABC“, we can easily find the first window which contains T is “ADOBEC“. Another possible candidate is “ADOBECODEBA“. In fact, we should skip this, because within it exists a sub-window “CODEBA” that is both shorter and satisfy the constraint. The final window to consider is “BANC“, which is also the minimum window.

    In the hint: We should also consider BECODEBA and CODEBA. Just to make the idea more complete.

    VA:F [1.9.22_1171]
    0
  34. J.S

    this wont work for ANBBCABAANC input string
    and search ABC

    wont return CAB,

    first match would be ANBBC
    second match ABAANC
    shortest between both

    one alternative is to instead of resuming from i, resume from begin+1, will increase complexity

    VA:F [1.9.22_1171]
    0
  35. Pingback: [新leetcode]76. Minimum Window Substring – Xiaoyaoworm's Territory

  36. akash kandpal

    Essentially you have a start and end pointer starting from the beginning of the string. You keep moving the end pointer and including more characters till you have all the characters of T included. At this point, you start moving start pointer and popping out characters till the point that you still have all the characters of T included. Update your answer and keep repeating the process.

    VA:F [1.9.22_1171]
    0

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use the <code> tag to embed your code.

*