76. Minimum Window Substring
Description
See https://leetcode.com/problems/minimum-window-substring/
Observations
Examples
Given
s = "abcdef"
andt = "bd"
the minimum substring ofs
containingt
would be "bcd"Given
s = "acbbaca"
andt = "aba"
the minimum substring ofs
containingt
would be "baca"Given
s = "acbbac"
andt = "aba"
the minimum substring of s containing t would be "acbba"Note that
s
is the same as thes
in example 2, but is missing the final "a". "b" is included twice in the result, even though we only needed 1. We are allowed to have more of a required letter, just not less.
Notes
The problem is asking for the min substring of a string. This combined with the fact the brute force solution is very complex leads to a sliding window as a solution. In particular it uses the fast/slow sliding window approach. We want to expand the right side of the window until we have a valid substring. Then we contract the left side of the window until we find the minimum valid substring.
The answer's first and last letter will always be a required letter. This makes sense because if it was a non-required letter, we could shorten the substring by eliminating that letter.
We need to keep track of the required letters (t) and the number of times they at least must appear in the substring. This leads to a hash (dict) to hold this data.
We need to find a way to compare the letters in the sliding window against the required letters. We could also keep the sliding window letters in a hash but we cannot compare the two hashes (hash1 == hash2) as this is an O(N) operation. Doing this every time the sliding window hash changes would be very expensive.
There are two ways to approach the solution. But, they both have the general idea of using the length of the required hash (the number of keys) and comparing it against the number of matching characters in the sliding window.
Approach 1
Use a hash, t_counts
, to record the required letters and their counts.
Use a found
variable to keep track how many required letters have met their required count. The found
variable is used to tell whether we have formed a valid substring in the sliding window. When found
is equal to the length of t_counts
(the number of keys), we have a valid substring. A boolean cannot be used because there is more than one character to keep track of.
Use a hash, window_counts
, to keep track of the letters and their counts in the sliding window.
When window_counts[letter] == t_counts[letter]
we have found the minimum number of instances of that letter in the window. So the found
variable can be incremented.
When found == len(t_counts)
we can move the left pointer and check to see if we still have a valid substring. The window_counts
and found
variables will need to be decremented as the left pointer increments. This is because as a letter leaves the sliding window we need to decrement its count in the window_counts
hash. If that letter happens to be a required letter and its count falls below its required count then found
needs to be decremented because we no longer have a valid substring.
Example:
Given s = "acbbaca"
and t = "aba"
.
t_counts = {"a":2, "b":1"}
len(t_cache) = 2
See solution 1 below for the code.
Approach 2
Instead of using the window_counts
dict to keep track of the letters in the sliding window, the t_counts
dict can be used instead. When a required letter is encountered, the associated count in t_counts
is decremented. When the count gets to zero then we have found the correct number of occurrences of that letter in the sliding window.
In this solution, a count of 0 means we have found the required number of instances of the letter in the sliding window. A negative number means we've found more than required. A positive number means we need to find more instances of the letter.
Because additional characters are inserted into the t_counts
dict, the length of it must be stored before the sliding window processing starts.
As in the approach above, we move the left pointer to see if we still have a valid substring with a smaller window. The logic is similar to above, but this time we increment the value in t_counts
when a letter is removed from the sliding window.
Note, that some of the numbers may go to a negative number, but that is ok, and is needed to keep the counts correct in the sliding window.
See solution 2 below.
Solution
These solutions use the fast/slow sliding window technique.
See Leetcode's solution at: https://leetcode.com/problems/minimum-window-substring/solution/
Solution 1
Note: s is length m, t is length n
Time complexity: O(2m + n) -> O(m + n). (We might visit every letter in s twice as the worst case scenario).
Space complexity: O(m + n). Although if we did not store the non-required letters this could be reduced to O(n + n). We have two hashes and both would store the required letters. This further reduces to O(n).
Solution 2
Note: s is length m, t is length n
Time complexity: O(2m + n) -> O(m + n). (We might visit every letter in s twice as the worst case scenario).
Space complexity: O(m + n). Although if we did not store the non-required letters this could be reduced to O(n), as we only store the required letters in one hash.
Resources
NeetCode youtube video - has a good explanation of approach 1
Last updated