Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of VB.NET regular expression

2025-03-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

Editor to share with you the example analysis of VB.NET regular expressions, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's learn about it!

1. Selector

The word "select" in the VB.NET regular expression indicates a choice. You can use selectors to match one of several possible regular expressions. If you want to search for the words "cat" or "dog", you can use < < cat searching dog > >. If you want more choices, you just need to expand the list < < cat 'dog' mouse 'fish >. The selector has the priority of * * in the regular expression, that is, it tells the engine to match either all expressions to the left of the selector or all expressions on the right. You can also use parentheses to limit the scope of selectors. Such as < <\ b (cat expressions dog)\ b >, which tells the cat engine to treat (regular expressions dog) as a regular expression unit.

Note that the regular engine's "eagerness to show work" is urgent and stops searching when it finds a valid match. Therefore, under certain conditions, the order of the expressions on both sides of the selector will affect the result. Suppose you want to use regular expressions to search a list of functions in a programming language: Get,GetValue,Set or SetValue. One obvious solution is < < Get 'GetValue' Set 'SetValue >. Let's take a look at the results when searching for SetValue. Because both < < Get > > and < < GetValue > > failed, and < < Set > > matched successfully. Because regular-oriented engines are "eager", it returns * successful matches, namely "Set", instead of continuing to search for other better matches. Contrary to what we expected, the regular expression does not match the entire string. There are several possible solutions.

One is to change the order of options, taking into account the "eagerness" of regular engines, for example, we use < < GetValue matches Get SetValue matches Set > > so that we can search for the longest matches first. We can also combine the four options into two options: < < Get (Value)? optional Set (Value)? > >. Because question mark repeaters are greedy, SetValue is always matched before Set. A better solution is to use word boundaries: < <\ b (Get 'GetValue' Set 'SetValue)\ b > or < <\ b (Get (Value)? Set (Value)?\ b > >. Further, since all choices have the same ending, we can optimize the regular expression to < b (Get choices Set) (Value)?\ b >.

two。 Groups and backward references

By placing parts of regular expressions in parentheses, you can group them. Then you can use some regular operations for the entire group, such as repetition operators. Note that only parentheses "()" can be used to form groups. "[]" is used to define character sets. "{}" is used to define repetitive operations. When a regular expression group is defined with "()", the regular engine numbers the matched groups sequentially and stores them in the cache. When referencing a matched group backwards, it can be referenced as "\ numeric". < <\ 1 > > references * matching backward reference groups, < <\ 2 > > references the second group, and so on, < <\ n > > references the nth group. < <\ 0 > > refers to the entire matched regular expression itself. Let's look at an example. Suppose you want to match the opening and closing tags of a HTML tag, as well as the text in the middle of the tag. For example, Thisisatest, we need to match and and the middle text. We can use the following regular expression: "] * >. *?" first of all, "will match the" > "of". "

Next, the regular engine will lazily match the characters before the closing tag until it encounters a "" you can refer to the same backward reference group multiple times, and < < ([aMuc]) x\ 1x\ 1 > > will match "axaxa", "bxbxb" and "cxcxc". If the group referenced in numeric form does not have a valid match, the referenced content is simply empty. A backward reference cannot be used by itself. < ([abc]\ 1) > > is incorrect. So you can't use < <\ 0 > > for a regular expression match itself, it can only be used in a replace operation. Backward references cannot be used within the character set. The < <\ 1 > > in < < (a) [\ 1b] > > does not indicate a backward reference. Within the character set, < < 1 > > can be interpreted as transcoding in octal form. Referencing backwards slows down the engine because it needs to store matching groups. If you don't need a backward reference, you can tell the engine that it is not stored for a group. For example: < < Get (?: Value) > >. The "(" followed by "?:" tells the engine that for groups (Value), matching values are not stored for backward reference.

Repeat operation and backward reference when using the repeat operator on a group, the backward reference content in the cache will be constantly refreshed, leaving only the content that matches. For example: < < ([abc] +) =\ 1 > will match "cab=cab", but < < ([abc]) + =\ 1 > will not. Because when ([abc]) * matches "c" times, "\ 1" represents "c"; then ([abc]) continues to match "a" and "b". * * "\ 1" stands for "b", so it matches "cab=b". Application: check for duplicate words-when editing text, it is easy to enter duplicate words, such as "thethe". These duplicate words can be detected using < < b (\ w+)\ s +\ 1\ b >. To delete the second word, simply replace "\ 1" with the replacement function.

Groups are named and referenced in PHP,Python, and groups can be named with < < (? Pgroup) > >. In this case, the lexical? P is a name for the group. Where name is the name you gave to the group. You can quote it with (? P=name). The .NET framework also supports named groups. Unfortunately, Microsoft programmers decided to invent their own syntax instead of following the rules of Perl and Python. So far, no other regular expression implementation supports the syntax invented by Microsoft.

The following is an example in .NET: (? group) (? 'second'group) as you can see, .NET provides two words to create naming groups: one is to use angle brackets, or to use single quotation marks. Angle brackets are easier to use in strings, and single quotes are more useful in ASP code, because "" is used as a HTML tag in ASP code. To reference a named group, use\ k or\ kroomnameplate. When doing search and replace, you can use "${name}" to refer to a named group.

Matching patterns of 3.VB.NET regular expressions

The regular expression engine discussed in this tutorial supports three matching patterns: < / I > makes regular expressions insensitive to case, and < / s > turns on single-line mode, that is, the period. Match new line character < / m > turns on "multiline mode", where "^" and "$" match the position before and after the new line character.

Turn mode on or off inside a regular expression if you insert a modifier (? ism) inside a regular expression, the modifier works only on the regular expression to its right. (?-I) is turned off case insensitive. You can test it quickly. < (? I) te (?-I) st > should match TEst, but not teST or TEST.

4. Atomic group and prevention of backtracking

In some special cases, because backtracking makes the engine extremely inefficient.

Let's look at an example: to match such a string, each field in the string is delimited by a comma, and the 12th field begins with P. It is easy to think of such a regular expression < < ^ (. *?) {11} P > >. This regular expression works well under normal circumstances. But in extreme cases, catastrophic backtracking can occur if the 12th field does not start with P. If you want to search for the string, it is "1pr 2je 3je 4je 5jr 6jr 7pr 8pr 9pr 10pr 11je 12je 13". First, the regular expression matches successfully until the 12th character. At this time, the string consumed by the previous regular expression is "1 < P > does not match" 12 ". So the engine does backtracking, and the string consumed by the regular expression is "1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11". To continue the next matching process, the next regular symbol is the period <. >, which matches the next comma ",". However, < <, > > does not match the "1" in the character "12". Match failed, continue backtracking. As you can imagine, such a backtracking combination is a very large number. So it could cause the engine to crash.

There are several ways to prevent such a huge backtracking: a simple one is to make the match as accurate as possible. Use the inverse character set instead of the period. For example, we use the following regular expression < < ^ ([^,\ r\ n] *,) {11} P > >, which reduces the number of failed backtracking to 11. Another option is to use atomic groups. The purpose of the atomic group is to make the regular engine fail faster. Therefore, it can effectively prevent massive backtracking. The syntax of an atomic group is < < (? > regular expression) > >. All regular expressions between (? >) are considered to be a single regular symbol. Once the match fails, the engine will go back to the regular expression section in front of the atomic group. In the previous example, < < ^ (?) (. *,) {11}) P > can be achieved by using atomic groups. Once the twelfth field fails to match, the engine goes back to < < ^ > > in front of the atomic group.

The above is all the content of the article "sample Analysis of VB.NET regular expressions". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report