LeetCode: Substring with Concatenation of All Words

Key takeaways:
The problem involves finding all starting indexes of substrings that contain all given words exactly once, in any order.
A sliding window technique traverses the string efficiently while checking for word matches.
The solution leverages two defaultdicts to track word frequencies and monitor the current window’s word occurrences.
The algorithm iterates over the input string in steps of word_length to minimize redundant checks.
When a valid substring is found, its starting index is added to the result list.
The time complexity is $O(n * m)$ , where n is the string length, and m is the total length of the words.
The space complexity is $O(m)$ , as it depends on the size of the word list and the dictionaries used.

Identifying substrings that contain a specific set of words in any order is a complex challenge. The problem of a substring with concatenating all words simulates real-world tasks, such as searching for keyword matches in a document or finding sequences in genetic data. Imagine a scenario where you need to find sections of text containing all given keywords exactly once, regardless of concatenating order. This problem encourages us to explore efficient algorithms for substring searching and is crucial for text processing and data retrieval applications. Let’s dive into the challenge of substring concatenation.

Problem statement

The problem involves finding all starting indexes of substrings in a given string s such that each substring contains all the words from a provided list words exactly once and in any order. All the words in the list words are of the same length. The length of the concatenated substring will be equal to the len(words[0])*len(words).

Algorithm

To solve the problem of finding all starting indices of substrings in a given string s such that each substring contains all the words from a provided list words exactly once and in any order, we employ a sliding window approach combined with efficient data structure usage.

Sliding window technique: We utilize a sliding window approach to iteratively traverse the input string s. This approach involves defining a fixed-size window and moving it over s while maintaining certain conditions. In our case, the window's size is determined by the total length of all words in words, ensuring that each substring we examine contains all words exactly once.
Data structures used: To efficiently track the occurrences of words in the sliding window and ensure each word is used exactly once, we employ two defaultdicts from the collections module: word_freq and window_freq.
1. word_freq stores the frequency of each word in words.
2. window_freq tracks the frequency of words within the current sliding window. Additionally, we use other variables, such as left, count, and indices to keep track of the window's boundaries, the count of words found in the window, and the indices of valid substrings, respectively.
Optimization techniques: In order to find an optimal solution in the least possible time and space complexity we perform the following steps.
1. We initialize the sliding window for each possible starting position of a word in words, which helps minimize unnecessary iterations.
2. By incrementally adjusting the sliding window and efficiently updating window_freq, we ensure that the algorithm's time complexity is optimized.
Validation check: Within the sliding window, we perform checks to ensure that each substring contains all words exactly once. If the substring meets this criterion, we append its starting index to the indices list.

from collections import defaultdict
def findSubstring(s, words):
    if not s or not words or (len(s)<(len(words[0]) * len(words))):
        return []
    word_length = len(words[0])
    total_length = len(words) * word_length
    word_freq = defaultdict(int)
    indices = []
    for word in words:
        word_freq[word] += 1
    for i in range(word_length):
        left = i
        count = 0
        window_freq = defaultdict(int)
        for j in range(i, len(s) - word_length + 1, word_length):
            word = s[j:j+word_length]
            if word in word_freq:
                window_freq[word] += 1
                count += 1
                while window_freq[word] > word_freq[word]:
                    left_word = s[left:left+word_length]
                    window_freq[left_word] -= 1
                    count -= 1
                    left += word_length
                if count == len(words):
                    indices.append(left)
            else:
                window_freq.clear()
                count = 0
                left = j + word_length
    return indices
# Test cases
s1 = "barfoothefoobarman"
words1 = ["foo","bar"]
s2 = "wordgoodgoodgoodbestword"
words2 = ["word","good","best","word"]
s3 = "barfoofoobarthefoobarman"
words3 = ["bar","foo","the"]
print("Indices of substrings with concatenation of all words: ", findSubstring(s1, words1))
print("Indices of substrings with concatenation of all words: ", findSubstring(s2, words2))
print("Indices of substrings with concatenation of all words: ", findSubstring(s3, words3))

Code implementation in six languages

Time complexity

The time complexity of the optimized code can be analyzed based on the nested loops used to iterate through the characters of the input string s. The outer loop iterates through each possible starting position of a word in the words list, which has a length of word_length. The inner loop slides the window to the right with a step size of word_length, iterating through the characters of s. Therefore, the overall time complexity of the code is $O(n * m)$ , where $n$ is the length of the input string s, and $m$ is the length of words in the words list. This complexity arises from the nested loops and the operations performed within them.

Space complexity

The space complexity of the optimized code primarily depends on the additional data structures used to store intermediate results and track word occurrences. These include the word_freq dictionary to store the frequency of words in the words list, the window_freq dictionary to track the frequency of words within the sliding window and the result list to store the starting indices of valid substrings found in s. Additionally, the code utilizes variables such as word_length, and total_length, which occupy constant space. Therefore, the overall space complexity of the code is $O(m)$ , where $m$ is the total number of words in the words list. This complexity arises from the dictionaries and lists whose sizes grow linearly with the size of the words list.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

LeetCode: Substring with Concatenation of All Words

Problem statement

Knowledge test

Algorithm

Coding example

Time complexity

Space complexity