Knuth–Morris–Pratt Algorithm

@kmp

KMP is a string searching algorithm that finds all occurrences of a pattern in a text efficiently by avoiding re-checking characters using a prefix table.

Pseudo-code

function build_lps(P):
    m <- length of P
    lps[0] <- 0
    len <- 0
    i <- 1

    while i < m:
        if P[i] == P[len] then
            len <- len + 1
            lps[i] <- len
            i <- i + 1
        else
            if len != 0 then
                len <- lps[len - 1]
            else
                lps[i] <- 0
                i <- i + 1

function kmp_search(T, P):
    n <- length of T
    m <- length of P
    lps <- build_lps(P)

    i <- 0
    j <- 0
    k <- 0

    while i < n:
        if T[i] == P[j] then
            i <- i + 1
            j <- j + 1

        if j == m then
            k <- k + 1
            j <- lps[j - 1]

        else if i < n and T[i] != P[j] then
            if j != 0 then
                j <- lps[j - 1]
            else
                i <- i + 1

    return k

P P is the pattern you are looking for, like a word you want to find.
T T is the full text where you search, like a long sentence or book.
i i is the position in the text T.
j j is the position in the pattern P.
lps lps is an array that tells us how much we can skip when a mismatch happens.
m m is the length of the pattern.
n n is the length of the text.
k k counts how many times we found the pattern.

0 0 is used to start indexes and reset positions.
-1 -1 can represent an invalid or non-existing position.
1 1 represents a single step forward or a found match increment.

Recipe Steps

Think of searching for a word in a long text. Normally, if you mismatch, you restart from the next position, which is slow.

KMP first studies the pattern itself and builds a helper table (lps). This is like learning patterns inside the word.

This table tells you how far you can jump when something does not match, instead of starting over.

Now start scanning the text. If characters match, move forward in both text and pattern.

If there is a mismatch, use the lps table to jump to a smarter position in the pattern instead of restarting.

When you reach the end of the pattern, you found a match. Count it and continue searching.

Questions & Answers

Because it does not re-check characters and uses the lps table to skip unnecessary comparisons.

It stores the length of the longest prefix that is also a suffix for each position.

We jump to a smaller valid prefix using the lps table instead of starting over.

O(n + m), because both the text and pattern are processed efficiently once.

- The key idea is to avoid re-checking characters after a mismatch.
- The lps table is built only from the pattern, not the text.
- After finding a match, we do not restart completely but reuse previous information.
- Understanding the lps table is the hardest but most important part.
- KMP is very useful when searching the same pattern many times.
- It works best for string matching problems with repeated patterns.

Related Algorithms

Rabin-Karp Algorithm

Rabin-Karp is a string searching algorithm that uses hashing to quickly compare a pattern with parts of a text using a rolling hash technique.

View

Z-Algorithm

Z-Algorithm is a string searching algorithm that uses a preprocessed array to find all occurrences of a pattern within a main string. It is particularly useful for finding all occurrences of a pattern in a long string, as it allows for efficient searching and multiple match detection.

View

Knuth–Morris–Pratt Algorithm

Pseudo-code

Recipe Steps

Questions & Answers

Why is KMP faster than naive search?

What does the lps array store?

What happens on mismatch?

What is the time complexity?

Related Algorithms

Rabin-Karp Algorithm

Z-Algorithm