String Matching
We now leave quicksort to look at a familiar algorithm: string matching. In other words, given a string $$S$$, we want to find whether a string $$W$$ occurs within that string.
For example, if $$S$$ is ABC ABCDAB ABCDABCDABDE
and $$W$$ is ABCDABD
, then $$W$$ appears in $$S$$ (at position 15).
Naive algorithm
At this point, you're likely familiar with the "naive" string matching algorithm, which checks every possible position in $$S$$ for $$W$$:
def naive_string_match(s, w):
n = len(s)
m = len(w)
# Loop through the text from the start to the point where the pattern can fit
for i in range(n - m + 1):
# Check for a match at this position
match = True
for j in range(m):
if s[i + j] != w[j]:
match = False
break
# If match is still True, we found a match
if match:
return i
return -1
The worst-case running time for this algorithm is $$O(n * m)$$, where $$n$$ is the length of the longer string $$S$$ and $$m$$ is the length of the smaller string $$W$$. This is because the algorithm compares each character of the pattern with the text for each possible starting position.
However, we can do better than this algorithm. One example is the Knuth-Morris-Pratt algorithm.
Knuth-Morris-Pratt algorithm
The Knuth-Morris-Pratt (KMP) algorithm improves on naive string search using the following algorithm:
- Preprocessing phase: The KMP algorithm preprocesses the pattern to create a partial match (or "failure") table, which takes $$O(m)$$ time.
- Matching phase: The actual string matching is done in $$O(n)$$ time. The KMP algorithm avoids redundant comparisons by using the information from the partial match table, ensuring that each character in the text is compared at most once.
Watch the following video to see how the KMP algorithm works:
As described above, this algorithm is significantly faster, and operates in $$O(n + m)$$ time. However, the space complexity of the algorithm increases to $$O(m)$$, which is more space than the naive algorithm.