76. Minimum Window Substring

Description

See https://leetcode.com/problems/minimum-window-substring/

Observations

Examples

  1. Given s = "abcdef" and t = "bd" the minimum substring of s containing t would be "bcd"

  2. Given s = "acbbaca" and t = "aba" the minimum substring of s containing t would be "baca"

  3. Given s = "acbbac" and t = "aba" the minimum substring of s containing t would be "acbba"

    • Note that s is the same as the s in example 2, but is missing the final "a". "b" is included twice in the result, even though we only needed 1. We are allowed to have more of a required letter, just not less.

Notes

  1. The problem is asking for the min substring of a string. This combined with the fact the brute force solution is very complex leads to a sliding window as a solution. In particular it uses the fast/slow sliding window approach. We want to expand the right side of the window until we have a valid substring. Then we contract the left side of the window until we find the minimum valid substring.

  2. The answer's first and last letter will always be a required letter. This makes sense because if it was a non-required letter, we could shorten the substring by eliminating that letter.

  3. We need to keep track of the required letters (t) and the number of times they at least must appear in the substring. This leads to a hash (dict) to hold this data.

  4. We need to find a way to compare the letters in the sliding window against the required letters. We could also keep the sliding window letters in a hash but we cannot compare the two hashes (hash1 == hash2) as this is an O(N) operation. Doing this every time the sliding window hash changes would be very expensive.

  5. There are two ways to approach the solution. But, they both have the general idea of using the length of the required hash (the number of keys) and comparing it against the number of matching characters in the sliding window.

Approach 1

Use a hash, t_counts, to record the required letters and their counts.

Use a found variable to keep track how many required letters have met their required count. The found variable is used to tell whether we have formed a valid substring in the sliding window. When found is equal to the length of t_counts (the number of keys), we have a valid substring. A boolean cannot be used because there is more than one character to keep track of.

Use a hash, window_counts, to keep track of the letters and their counts in the sliding window.

When window_counts[letter] == t_counts[letter] we have found the minimum number of instances of that letter in the window. So the found variable can be incremented.

When found == len(t_counts) we can move the left pointer and check to see if we still have a valid substring. The window_counts and found variables will need to be decremented as the left pointer increments. This is because as a letter leaves the sliding window we need to decrement its count in the window_counts hash. If that letter happens to be a required letter and its count falls below its required count then found needs to be decremented because we no longer have a valid substring.

Example:

Given s = "acbbaca" and t = "aba" .

t_counts = {"a":2, "b":1"}

len(t_cache) = 2

See solution 1 below for the code.

Approach 2

Instead of using the window_counts dict to keep track of the letters in the sliding window, the t_counts dict can be used instead. When a required letter is encountered, the associated count in t_counts is decremented. When the count gets to zero then we have found the correct number of occurrences of that letter in the sliding window.

In this solution, a count of 0 means we have found the required number of instances of the letter in the sliding window. A negative number means we've found more than required. A positive number means we need to find more instances of the letter.

Because additional characters are inserted into the t_counts dict, the length of it must be stored before the sliding window processing starts.

As in the approach above, we move the left pointer to see if we still have a valid substring with a smaller window. The logic is similar to above, but this time we increment the value in t_counts when a letter is removed from the sliding window.

Note, that some of the numbers may go to a negative number, but that is ok, and is needed to keep the counts correct in the sliding window.

See solution 2 below.

Solution

These solutions use the fast/slow sliding window technique.

See Leetcode's solution at: https://leetcode.com/problems/minimum-window-substring/solution/

Solution 1

class Solution:
    def minWindow(self, s: str, t: str) -> str:
        if len(t) > len(s):
            return ""
        # dict with the required letters and their counts
        t_counts = Counter(t)
        # empty dict that will hold the letters in the sliding window. Values default to zero
        window_counts = defaultdict(int)
        # the left pointer
        left = 0
        # The found variable will keep track of how many required letters we have found. When this number matches the 
        # length of t_counts (number of keys) we have found a valid substring.
        found = 0
        # keep track of the minimum substring start and end index range. The start and end indexes will be used at the
        # end to create the substring to return. I'm doing this instead of keeping the actual substring, as I would need
        # to slice "s" each time I find a minimum. The slice operation is an O(K) (where k is the number of elements in
        # the slice). So, it's a bit faster to just store the start and end ranges and do the slice at the end. Infinity
        # is used because we want to find the min value. So, we need an initial value greater than any possible result
        # in order to get the min value.
        min_substring = (float("-inf"), float("inf"))
        for right, char in enumerate(s):
            # just add the char to the sliding window regardless of whether it is a required letter or not. I could
            # check if the letter is not in the t_counts dictionary, but that adds complexity. If this was a real world
            # problem and we found it was taking too much space then we could look at this optimization.
            window_counts[char] += 1
            # we found the number of instances needed for this character, so we can increment the found variable. Note
            # that any additional instances of the required letter do not affect our decision in the found criteria. 
            if t_counts[char] == window_counts[char]:
                found += 1
            # We found all the required letters in their appropriate amounts, so we have a valid substring. Now,
            # increment the left pointer to see if we can remove any leading characters to make a smaller substring.
            while left <= right and found == len(t_counts):
                left_char = s[left]
                if right - left < min_substring[1] - min_substring[0]:
                    min_substring = (left, right)
                # remove the character as it will no longer be in the sliding window
                window_counts[left_char] -= 1
                # if the count for this letter falls below the required number, then we are no longer in a valid 
                # substring scenario.
                if window_counts[left_char] < t_counts[left_char]:
                    found -= 1
                left += 1
        return "" if min_substring[0] == float("-INF") else s[min_substring[0]:min_substring[1]+ 1 ]

Note: s is length m, t is length n

Time complexity: O(2m + n) -> O(m + n). (We might visit every letter in s twice as the worst case scenario).

Space complexity: O(m + n). Although if we did not store the non-required letters this could be reduced to O(n + n). We have two hashes and both would store the required letters. This further reduces to O(n).

Solution 2

class Solution:
    def minWindow(self, s: str, t: str) -> str:
        if len(t) > len(s):
            return ""
        # dict with the required letters and their counts
        t_counts = Counter(t)
        left = 0
        # The number of unique required letters. We are storing this value in this solution, as the t_counts cache will
        # grow as characters are added to it. A possible enhancement would be to NOT store the non-required letters.
        num_needed = len(t_counts)
        found = 0
        # The min_substring works the same as in the above solution
        min_substring = (float("-inf"), float("inf"))
        for right, char in enumerate(s):
            # Non-required letters will have a negative count, they will never be positive. That is good, because we do
            # not want them influencing the found variable.
            t_counts[char] -= 1
            # in this solution, having a count of 0 means we have found the required number of instances of the letter
            # in the sliding window. A negative number means we've found more than required. A positive number means we
            # need to find more instances of it.
            if t_counts[char] == 0:
                found += 1
            while num_needed == found and left <= right:
                left_char = s[left]
                if right - left < min_substring[1] - min_substring[0]:
                    min_substring = (left, right)
                t_counts[left_char] += 1
                # a positive number means we need to find more instances of the letter for the substring. We have an
                # invalid substring, so decrement found.
                if t_counts[left_char] > 0:
                    found -= 1
                left += 1
        return "" if min_substring[0] == float("-inf") else s[min_substring[0]:min_substring[1] + 1]
        

Note: s is length m, t is length n

Time complexity: O(2m + n) -> O(m + n). (We might visit every letter in s twice as the worst case scenario).

Space complexity: O(m + n). Although if we did not store the non-required letters this could be reduced to O(n), as we only store the required letters in one hash.

Resources

Last updated