GFG Link: https://www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/
Language: C#
Problem Statement
KMP (Knuth Morris Pratt) Pattern Searching
Matching Overview
txt = "AAAAABAAABA"
pat = "AAAA"
We compare first window of txt with pat
txt = "AAAAABAAABA"
pat = "AAAA" [Initial position]
We find a match. This is same as Naive String Matching.
In the next step, we compare next window of txt with pat.
txt = "AAAAABAAABA"
pat = "AAAA" [Pattern shifted one position]
This is where KMP does optimization over Naive. In this
second window, we only compare fourth A of pattern
with fourth character of current window of text to decide
whether current window matches or not. Since we know
first three characters will anyway match, we skipped
matching first three characters.
Need of Preprocessing?
An important question arises from the above explanation,
how to know how many characters to be skipped. To know this,
we pre-process pattern and prepare an integer array
lps[] that tells us the count of characters to be skipped.
lps[i] = the longest proper prefix of pat[0..i]
which is also a suffix of pat[0..i].
Examples of lps[] construction:
For the pattern “AAAA”,
lps[] is [0, 1, 2, 3]
For the pattern “ABCDE”,
lps[] is [0, 0, 0, 0, 0]
For the pattern “AABAACAABAA”,
lps[] is [0, 1, 0, 1, 2, 0, 1, 2, 3, 4, 5]
For the pattern “AAACAAAAAC”,
lps[] is [0, 1, 2, 0, 1, 2, 3, 3, 3, 4]
For the pattern “AAABAAA”,
lps[] is [0, 1, 2, 0, 1, 2, 3]
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
lps[] = {0, 1, 2, 3}
i = 0, j = 0
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] match, do i++, j++
i = 1, j = 1
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] match, do i++, j++
i = 2, j = 2
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
pat[i] and pat[j] match, do i++, j++
i = 3, j = 3
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] match, do i++, j++
i = 4, j = 4
Since j == M, print pattern found and reset j,
j = lps[j-1] = lps[3] = 3
Here unlike Naive algorithm, we do not match first three
characters of this window. Value of lps[j-1] (in above
step) gave us index of next character to match.
i = 4, j = 3
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] match, do i++, j++
i = 5, j = 4
Since j == M, print pattern found and reset j,
j = lps[j-1] = lps[3] = 3
Again unlike Naive algorithm, we do not match first three
characters of this window. Value of lps[j-1] (in above
step) gave us index of next character to match.
i = 5, j = 3
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] do NOT match and j > 0, change only j
j = lps[j-1] = lps[2] = 2
i = 5, j = 2
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] do NOT match and j > 0, change only j
j = lps[j-1] = lps[1] = 1
i = 5, j = 1
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] do NOT match and j > 0, change only j
j = lps[j-1] = lps[0] = 0
i = 5, j = 0
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] do NOT match and j is 0, we do i++.
i = 6, j = 0
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] match, do i++ and j++
i = 7, j = 1
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] match, do i++ and j++
We continue this way...
Solution
void KmpSearch(string pat, string txt)
{
int m = pat.Length;
int n = txt.Length;
var lps = new int[m];
int j = 0;
ComputeLPSArray(pat, lps);
int i = 0;
while (i < n)
{
if(pat[j] == txt[i])
{
i++;
j++;
}
if (j == m)
{
Console.WriteLine($"Pattern found at index {i-j}");
j = lps[j - 1];
}
else if (i < n && pat[j] != txt[i])
{
if(j != 0)
{
j = lps[j-1];
}
else
{
i++;
}
}
}
}
void ComputeLPSArray(string pat, int[] lps)
{
int m = pat.Length;
int len = 0;
int i = 1;
lps[0] = 0; // lps[0] is always [0];
while (i < m)
{
if (pat[i] == pat[len])
{
len++;
lps[i] = len;
i++;
}
else // pat[i] != pat[len]
{
if (len != 0)
{
len = lps[len - 1];
}
else
{
lps[i] = len;
i++;
}
}
}
}
KmpSearch(pattern, text);
Complexity
Time Complexity: O(N)
where N
is the length of text
Space Complexity: O(M)
where M
is the length of pattern