Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Boyer Moore algorithm

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

How to use the Boyer Moore algorithm, I believe that many inexperienced people do not know what to do, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.

Brief introduction

For the string search algorithm, I have discussed the idea and implementation of the KMP algorithm. The implementation idea of KMP algorithm is based on the prefix and suffix matching in the pattern string, and the efficiency of this algorithm is fast enough. Unexpectedly, the Boyer Moore algorithm we are going to discuss here is even more efficient.

Train of thought analysis

In the previous algorithm, we compared the target string one by one from the beginning to the end of the pattern string, in this way, we continued to compare the next element when we encountered a matching element, and when there was no match, we selected the following elements in the schema string by looking up the matching table built by the schema table to continue the comparison. So that the index in the target string does not have to move back. However, the order of comparison here is just the opposite. Suppose the length of the target string is n and the length of the pattern string is m. So, first of all, let's start with the m-1 index position of the two strings, and if it's the same, it goes straight forward, and if it's different, we need to judge according to the situation of this different character.

Depending on the comparison character, we may have two cases, one is that this different character is not in the pattern string. So obviously, no character in the pattern string can match it, we just need to jump to the position after the character to continue the comparison. There is also a situation where the character is in the pattern string, so we need to find the last character that matches this character to appear in the pattern string. In this way, we can continue the comparison in the pattern string from the position of the last match.

The previous part of the description is rather abstract, and we can discuss it with a concrete example. Suppose we have the string s and the value of pdepartment s: FINDINAHAYSTACKNEEDLEINA, and the value of p is: NEEDLE. We use I and j to represent the current position of s and p, respectively. So, the process of their comparison is as follows:

Combined with the above figure, their initial position of comparison is from index 5. At this time, the element of s in this position is N, and the character of p in this position is E. So they don't match. We need to skip to the back to make a comparison. The character N exists in p, and the last position where N appears in p is index 0. So we start from this position of N to the position at the end of p as the comparison point, and compare it from the end of p to the position of the corresponding s. At this time, when the index position of I is 10, it does not match the last character E at the end of p, and the character is S, and the corresponding character cannot be found in the p string. At this point, we need to start with the element after S and align with the first element of p, and then we compare it from the last element of p and the index I + j of the s string. At this time, we compared to the beginning and found that there was a matching string. So return I = 15.

Generalization

After the above discussion, we find that what we need to consider in order to implement this algorithm is how to unify the string matching and mismatch scenarios in a relatively simple way. In this way, you can adjust each time you compare to a match or a mismatch. Let's take a closer look at the previous scene.

When the compared character does not match the character in the current p pattern string, if the mismatched character is not in the pattern string, then the situation is adjusted as shown below:

Because in this case, no matter which element in the pattern string can match this element, it is equivalent to moving the entire pattern string to the position behind the element to align and compare. At this time, it is equivalent to resetting j to the value of m-1, and the value of I is also adjusted to I + j + 1.

Another situation is that the currently mismatched character is in the pattern string, as shown in the following figure:

In this example, because the mismatched character is N, but N exists in the pattern string. So we need to move the position of I to the location of the mismatched element. At this time, the position in the pattern string is j, and if we can find the index position where N is, we can do it through j-right ['N'].

So in general, when we compare the index I of the target string and the index j of the schema string, they compare and adjust in reverse order. I is a gradually increasing adjustment, while j is a decreasing comparison. When s.charAt (I + j) = = p.charAt (j), they match. So at this time, just continue with the jmure-this operation for the next step of comparison. When s.charAt (I + j)! = p.charAt (j), it needs to be adjusted. According to the previous discussion, if s.charAt (I + j) does not exist in the middle of p, we need to move I to the right by j + 1 position. As discussed earlier, this is j-right [s.charAt (I + j)]. This also shows that right[ s.charat (I + j)] is-1.

If s.charAt (I + j) exists in the pattern string, then we need to find the position of the character in the middle of the pattern string, and the distance we need to move is the distance from j to this character. If you use j-right [s.charAt (I + j)], then right [s.charAt (I + j)] should record the position of the character s.charAt (I + j) in the pattern string.

Thus, we unify the values recorded in the right [] array. The core of the problem boils down to how to calculate the right array.

Calculate right

The process of calculating the right array is relatively simple, because all you have to do is record the last position of each character in the pattern string. For elements that are not in the pattern string, just set it to-1. So we can implement it with the following code:

Right = new int [r]

For (int c = 0; c < r; C++)

Right [c] =-1

For (int j = 0; j < m; jacks +)

Right[ pat.charat (j)] = j

Final realization

All that's left is to put the whole implementation together. After setting up the right array, all that's left is to start with the target string I and compare the positions of s [I + j] and p [j] each time. If equal, continue the comparison until j = 0. If not, the distance skip to be moved after I is calculated, where skip = j-right [s [I + j]]. In this way, the detailed implementation is as follows:

Public class BoyerMoore {

Private int [] right

Private String pat

Public BoyerMoore (String pat) {

This.pat = pat

Int m = pat.length ()

Int r = 256

Right = new int [r]

For (int c = 0; c < r; C++)

Right [c] =-1

For (int j = 0; j < m; jacks +)

Right[ pat.charat (j)] = j

}

Public int search (String txt) {

Int n = txt.length ()

Int m = pat.length ()

Int skip

For (int I = 0; I = 0; Jmuri -) {

If (pat.charAt (j)! = txt.charAt (I + j)) {

Skip = j-right [txt.charAt (I + j)]

If (skip < 1) skip = 1

Break

}

}

If (skip = = 0) return I

}

Return-1

}

}

In the above code implementation, there is also a small detail worth paying attention to, that is, after calculating the skip, it is necessary to determine whether the skip is less than 1. Because it is possible to encounter the current mismatch character exists in the pattern string, but its last position is greater than the current value j. At this time, we have to make sure that I will move at least one bit. So set it to 1 here.

Another thing to note is that because we need to take into account the mapping of each character, we need to create a right array as long as the character set. When the character set is too large, it takes up more space. This is just a right table for the ideal 256-character situation.

In terms of the performance of the algorithm, the time complexity of the algorithm under ideal conditions is O (Mhammer N), where M is the length of the target string and N is the length of the pattern string.

Boyer Moore algorithm is a widely used algorithm in practice, and its speed is faster. Compared with other string matching methods, it uses a back-to-front comparison. At the same time, a character mapping table is established to save the position of the characters in the current mode string, so as to facilitate the adjustment of the position of the source string in each comparison. Its biggest feature is that it is no longer moving one character after another, but skipping several characters at a time according to the calculation. In this way, the overall performance is ideal.

After reading the above, have you mastered how to use the Boyer Moore algorithm? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report