Knuth–Morris–Pratt Algorithm

@kmp

KMP is a string searching algorithm that finds all occurrences of a pattern in a text efficiently by avoiding re-checking characters using a prefix table.

Pseudo-code
function build_lps(P):
    m <- length of P
    lps[0] <- 0
    len <- 0
    i <- 1

    while i < m:
        if P[i] == P[len] then
            len <- len + 1
            lps[i] <- len
            i <- i + 1
        else
            if len != 0 then
                len <- lps[len - 1]
            else
                lps[i] <- 0
                i <- i + 1

function kmp_search(T, P):
    n <- length of T
    m <- length of P
    lps <- build_lps(P)

    i <- 0
    j <- 0
    k <- 0

    while i < n:
        if T[i] == P[j] then
            i <- i + 1
            j <- j + 1

        if j == m then
            k <- k + 1
            j <- lps[j - 1]

        else if i < n and T[i] != P[j] then
            if j != 0 then
                j <- lps[j - 1]
            else
                i <- i + 1

    return k
                
  • P P is the pattern you are looking for, like a word you want to find.
  • T T is the full text where you search, like a long sentence or book.
  • i i is the position in the text T.
  • j j is the position in the pattern P.
  • lps lps is an array that tells us how much we can skip when a mismatch happens.
  • m m is the length of the pattern.
  • n n is the length of the text.
  • k k counts how many times we found the pattern.
  • 0 0 is used to start indexes and reset positions.
  • -1 -1 can represent an invalid or non-existing position.
  • 1 1 represents a single step forward or a found match increment.
Recipe Steps
1

Think of searching for a word in a long text. Normally, if you mismatch, you restart from the next position, which is slow.

2

KMP first studies the pattern itself and builds a helper table (lps). This is like learning patterns inside the word.

3

This table tells you how far you can jump when something does not match, instead of starting over.

4

Now start scanning the text. If characters match, move forward in both text and pattern.

5

If there is a mismatch, use the lps table to jump to a smarter position in the pattern instead of restarting.

6

When you reach the end of the pattern, you found a match. Count it and continue searching.

Questions & Answers

Because it does not re-check characters and uses the lps table to skip unnecessary comparisons.

It stores the length of the longest prefix that is also a suffix for each position.

We jump to a smaller valid prefix using the lps table instead of starting over.

O(n + m), because both the text and pattern are processed efficiently once.
- The key idea is to avoid re-checking characters after a mismatch.
- The lps table is built only from the pattern, not the text.
- After finding a match, we do not restart completely but reuse previous information.
- Understanding the lps table is the hardest but most important part.
- KMP is very useful when searching the same pattern many times.
- It works best for string matching problems with repeated patterns.


Related Algorithms
Rabin-Karp Algorithm

Rabin-Karp is a string searching algorithm that uses hashing to quickly compare a pattern with parts of a text using a rolling hash technique.

View
Z-Algorithm

Z-Algorithm is a string searching algorithm that uses a preprocessed array to find all occurrences of a pattern within a main string. It is particularly useful for finding all occurrences of a pattern in a long string, as it allows for efficient searching and multiple match detection.

View