In bioinformatics, pattern searching is an essential problem. We search for strings in different types of sequences using such algorithms.
The Boyer-Moore algorithm is a string-searching algorithm used for finding DNA sequences from a string of DNA. It was developed in 1977 by Robert S. Boyer and J. Strother Moore.
Let's take an example of a sample text and a pattern that is to be detected.
Let's assume the original string is
The algorithm begins the alignment at
The strings are compared from the end of
Note: To learn more about the rules, visit this Answer.
Comparisons are then performed at the new alignment, which is repeated until the alignment is shifted past the end, indicating no further matches will occur.
This code sample uses the bad character rule.
number_of_characters = 256def bad_character_rule(string, size):#set all occurences as -1bad_character = [-1]*number_of_characters#set the value of the last occurrencefor i in range(size):bad_character[ord(string[i])] = i;return bad_characterdef search(original_string, search_string):n = len(original_string)m = len(search_string)bad_character = bad_character_rule(search_string, m)s = 0while(s <= n-m):j = m-1#reduce index of search_string while the serch_string and original_string matchwhile j>=0 and search_string[j] == original_string[s+j]:j -= 1#if search_string is found at s, set j as -1if j<0:print("search_string present at index = {}".format(s))s += (m-bad_character[ord(original_string[s+m])] if s+m<n else 1)else:s += max(1, j-bad_character[ord(original_string[s+j])])def main():original_string = "ACTGGATGT"search_string = "TG"search(original_string, search_string)if __name__ == '__main__':main()
In main.py
:
Lines 3–11: We use the bad_character_rule
function to preprocess the bad character rule.
Lines 13–32: We use the search
function for searching the pattern that makes use of the bad character for the given search_string
.
The bad character rule has the runtime
Free Resources