Boyer–Moore string-search algorithm

@boyermoore_stringsearch_algorithm

The Boyer–Moore string-search algorithm is a fast string searching algorithm that is often used in text editors and compilers. It does this by trying to match the pattern from the end of the string, making it efficient for large strings. It's used when you need to find a substring within a large text or string.

Pseudo-code

function BoyerMooreSearch(s, t):
  create a table 'badchar' of size 256 and initialize it with -1
  create a table 'goodchar' of size 256 and initialize it with 0
  for each character 'c' in the pattern 's':
    'badchar'[ord(c)] = i
    if 'c' is a good character:
      'goodchar'[ord(c)] = max(0, j - i)
  i = 0
  j = length of the pattern 's' - 1
  while i < length of the string 't':
    if the current character in the string matches the current character in the pattern:
      i += 1

s String to search for the pattern.
t String to search in for the pattern.
badchar Table used to store the last occurrence of each character in the pattern.
goodchar Table used to store the shift value for each character in the pattern.
i Index of the current position in the string.
j Index of the current position in the pattern.
k Number of characters matching at the current position.

256 Number of possible characters in the alphabet.
0 Initial shift value.

Recipe Steps

Write down the pattern you want to search for and the string you want to search in.

Create two tables, one for bad characters and one for good characters, and initialize them with the correct values.

Compare the current character in the string with the current character in the pattern, and if they match, move the string forward by one position.

If the pattern is found, return the starting position of the pattern in the string.

If the pattern is not found, return -1 to indicate that the pattern is not present in the string.

Questions & Answers

The main idea is to try to match the pattern from the end of the string, making it efficient for large strings.

The two tables are 'badchar' and 'goodchar', which store the last occurrence of each character in the pattern and the shift value for each character in the pattern, respectively.

The algorithm uses the 'badchar' table to determine the shift value for the next iteration, moving the pattern forward by the maximum of the shift value and the current position.

- The Boyer–Moore algorithm is a complex algorithm and requires a good understanding of string searching and pattern matching.
- It's essential to use the correct tables and initialize them with the correct values to ensure the algorithm works correctly.
- The algorithm has a time complexity of O(n/m), where n is the length of the string and m is the length of the pattern.

Related Algorithms

Rabin-Karp Algorithm

Rabin-Karp is a string searching algorithm that uses hashing to quickly compare a pattern with parts of a text using a rolling hash technique.

View

Z-Algorithm

Z-Algorithm is a string searching algorithm that uses a preprocessed array to find all occurrences of a pattern within a main string. It is particularly useful for finding all occurrences of a pattern in a long string, as it allows for efficient searching and multiple match detection.