In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces what is the internal mechanism of regular expression, the article is very detailed, has a certain reference value, interested friends must read it!
9. Word boundary
Metacharacters are also "anchors" that match positions. This match is a 0-length match.
There are four locations that are considered "word boundaries":
1) the position in front of the * characters of the string (if the * characters of the string is a "word character")
2) the position after one character of the string (if the character of the string is a "word character")
3) between a "word character" and a "non-word character", where the "non-word character" immediately follows the "word character"
4) between a "non-word character" and a "word character", where the "word character" immediately follows the "non-word character"
"word characters" are characters that can be matched with "\ w", and "non-word characters" are characters that can be matched with "\ W". In most regular expression implementations, "word characters" usually include.
For example, you can match a single 4 instead of part of a larger number. This regular expression does not match the 4 in "44".
In other words, it almost matches the position of the beginning and end of an "alphanumeric sequence".
The inverse set of "word boundary" is that the position he wants to match is between two "word characters" or between two "non-word characters".
Go deep inside the regular expression engine
Let's look at applying regular expressions to the string "This island is beautiful". The engine processes symbols first. Because\ b is 0 in length, the position in front of the T of * characters will be examined. Because T is a "word character" and the character before it is an empty character (void),\ b matches the word boundary. Then failed to match the * character "T". The matching process continues until there is a match between the fifth space and the fourth character "s".
However, the space character does not match. Go back to the sixth character "I", which matches the fifth space character, and then matches both the sixth and seventh characters. However, the eighth character does not match the second "word boundary", so the match fails again. To the 13th character I, because it forms a "word boundary" with the previous space character and matches "is". The engine then tries to match the second one. Because the 15th space character and "s" form the word boundary, the match is successful. The engine is in a hurry to return the result of a successful match.
10. Selector
A "|" in a regular expression indicates a selection. You can use selectors to match one of several possible regular expressions.
If you want to search for the words "cat" or "dog", you can use it. If you want more options, you just need to expand the list.
The selector has the priority of * * in the regular expression, that is, it tells the engine to match either all expressions to the left of the selector or all expressions on the right. You can also use parentheses to limit the scope of selectors. For example, this tells the regular engine to treat (cat | dog) as a regular expression unit.
Pay attention to the "eagerness to show work" of regular engines
The regular engine is urgent, and when it finds a valid match, it stops searching. Therefore, under certain conditions, the order of the expressions on both sides of the selector will affect the result. Suppose you want to use regular expressions to search a list of functions in a programming language: Get,GetValue,Set or SetValue. One obvious solution is. Let's take a look at the results when searching for SetValue.
Because both and failed, and the match was successful. Because regular-oriented engines are "eager", it returns * successful matches, namely "Set", instead of continuing to search for other better matches.
Contrary to what we expected, the regular expression does not match the entire string. There are several possible solutions. First, considering the "eagerness" of regular engines, change the order of options, such as we use, so that we can give priority to searching for the longest match. We can also combine four options into two options:. Because question mark repeaters are greedy, SetValue is always matched before Set.
A better solution is to use word boundaries: or. Further, since all choices have the same ending, we can optimize the regular expression to.
11. Groups and backward references
By placing parts of regular expressions in parentheses, you can group them. Then you can use some regular operations for the entire group, such as repetition operators.
Note that only parentheses "()" can be used to form groups. "[]" is used to define character sets. "{}" is used to define repetitive operations.
When a regular expression group is defined with "()", the regular engine numbers the matched groups sequentially and stores them in the cache. When referencing a matched group backwards, it can be referenced as "\ numeric". Reference * matching backward reference groups, reference the second group, and so on, reference the nth group. Instead, the entire matched regular expression itself is referenced. Let's look at an example.
Suppose you want to match the opening and closing tags of a HTML tag, as well as the text in the middle of the tag. For example, This is a test, we need to match and and the middle text. We can use the following regular expression: "] * >. *?"
First of all, the "will match" of ">". Next, the regular engine will lazily match the characters before the closing tag until it encounters a
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.