What is the Boyer-Moore algorithm?

Algorithm

Let's assume the original string is $T$ and its length is $n$ . On the other hand, the string to be searched is $P$ , and its length is $m$ .

The algorithm begins the alignment at $k=m$ . This means that the start of $P$ aligns with the start of $T$ . Characters in $T$ and $P$ are compared starting at the index $m$ in $P$ and index $k$ in $T$ while moving backward.

The strings are compared from the end of $P$ to the start of $P$ in a backward manner. This comparison continues till the beginning of $P$ is reached, indicating whether a match or a mismatch occurs. Upon a mismatch, the alignment is moved forward according to the specified rule. These rules are the bad character rule and the good suffix rule.

Note: To learn more about the rules, visit this Answer.

Comparisons are then performed at the new alignment, which is repeated until the alignment is shifted past the end, indicating no further matches will occur.

Code sample

This code sample uses the bad character rule.

number_of_characters = 256
def bad_character_rule(string, size):
	#set all occurences as -1
	bad_character = [-1]*number_of_characters
	#set the value of the last occurrence
	for i in range(size):
		bad_character[ord(string[i])] = i;
	return bad_character
def search(original_string, search_string):
	n = len(original_string)
	m = len(search_string)
	bad_character = bad_character_rule(search_string, m)
	
	s = 0
	while(s <= n-m):
		j = m-1
		#reduce index of search_string while the serch_string and original_string match
		while j>=0 and search_string[j] == original_string[s+j]:
			j -= 1
		#if search_string is found at s, set j as -1
		if j<0:
			print("search_string present at index = {}".format(s))
			s += (m-bad_character[ord(original_string[s+m])] if s+m<n else 1)
		else:
			s += max(1, j-bad_character[ord(original_string[s+j])])
def main():
	original_string = "ACTGGATGT"
	search_string = "TG"
	search(original_string, search_string)
if __name__ == '__main__':
	main()

Code explanation

In main.py:

Lines 3–11: We use the bad_character_rule function to preprocess the bad character rule.
Lines 13–32: We use the search function for searching the pattern that makes use of the bad character for the given search_string.

Conclusion

The bad character rule has the runtime $O(n/m)$ in the best case and $O(nm)$ in the worst case. The worst case occurs when the character of the original string and the string to be found are the same. The best case occurs when all the characters of the original string and the string to be matched are different.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

What is the Boyer-Moore algorithm?

Overview

Example

Algorithm

Code sample

Code explanation

Conclusion