Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of regular expression Group

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly shows you the "sample analysis of regular expression groups", which is easy to understand and well-organized. I hope it can help you solve your doubts. Let me lead you to study and learn the article "sample Analysis of regular expression groups".

Understanding of regular expression groups

By placing parts of regular expressions in parentheses, you can group them. Then you can use some regular operations for the entire group, such as repetition operators.

Note that only parentheses "()" can be used to form groups. "[]" is used to define character sets. "{}" is used to define repetitive operations.

When a regular expression group is defined with "()", the regular engine numbers the matched groups sequentially and stores them in the cache. When referencing a matched group backwards, it can be referenced as "\ numeric". < <\ 1 > > references * matching backward reference groups, < <\ 2 > > references the second group, and so on, < <\ n > > references the nth group. < <\ 0 > > refers to the entire matched regular expression itself. Let's look at an example.

Suppose you want to match the opening and closing tags of a HTML tag, as well as the text in the middle of the tag. For example, < B > This is a test < / B >, we need to match < B > and < / B > as well as the middle text. We can use the following regular expression: "< ([A murz] [A-Z0-9] *) [^ >] * >. *? < /\ 1 >"

First, "<" will match the * * characters "<" of "< B >". Then [Amurz] matches B, and [A-Z0-9] * will match 0 to multiple alphanumeric characters, followed by 0 to more non-">" characters. The ">" of the regular expression will match the ">" of "< B >". The regular engine will then lazily match the characters before the closing tag until it encounters a "< /" symbol. Then the "\ 1" in the regular expression refers to the previously matched group "([A murz] [A-Z0-9] *)", in this case, the tag "B". So the end that needs to be matched is labeled "< / B >".

Related parsing of regular expression groups:

You can make multiple references to the same backward reference group, and < < ([amurc]) x\ 1x\ 1 > will match "axaxa", "bxbxb" and "cxcxc". If the group referenced in numeric form does not have a valid match, the referenced content is simply empty.

A backward reference cannot be used by itself. < ([abc]\ 1) > > is incorrect. So you can't use < <\ 0 > > for a regular expression match itself, it can only be used in a replace operation.

Backward references cannot be used within the character set. The < <\ 1 > > in < < (a) [\ 1b] > > does not indicate a backward reference. Within the character set, < < 1 > > can be interpreted as transcoding in octal form.

Referencing backwards slows down the engine because it needs to store matching groups. If you don't need a backward reference, you can tell the engine that it is not stored for a group. For example: < < Get (?: Value) > >. The "(" followed by "?:" tells the engine that for groups (Value), matching values are not stored for backward reference.

Repetitive operations and backward references of regular expression groups

When you use the repeat operator on a group, the backward reference content in the cache is constantly refreshed, leaving only the content that matches. For example: < < ([abc] +) =\ 1 > will match "cab=cab", but < < ([abc]) + =\ 1 > will not. Because when ([abc]) * matches "c" times, "\ 1" represents "c"; then ([abc]) continues to match "a" and "b". * * "\ 1" stands for "b", so it matches "cab=b".

Application: check for duplicate words-when editing text, it is easy to enter duplicate words, such as "the the". These duplicate words can be detected using < < b (\ w+)\ s +\ 1\ b >. To delete the second word, simply replace "\ 1" with the replacement function.

Naming and referencing of regular expression groups

In PHP,Python, groups can be named with < < (? P < name > group) > >. In this case, the lexical? P < name > is the naming of the group. Where name is the name you gave to the group. You can quote it with (? P=name).

Named groups for .NET

The .NET framework also supports named groups. Unfortunately, Microsoft programmers decided to invent their own syntax instead of following the rules of Perl and Python. So far, no other regular expression implementation supports the syntax invented by Microsoft.

Here are examples from .NET:

(? < first > group) (? 'second'group)

As you can see, .NET provides two words to create naming groups: one is to use angle brackets "< >", or to use single quotes "'". Angle brackets are more convenient to use in strings, and single quotes are more useful in ASP code, because "< >" is used as a HTML tag in ASP code.

To reference a named group, use\ k < name > or\ k named groups.

When doing search and replace, you can use "${name}" to refer to a named group.

The above is all the content of the article "sample Analysis of regular expression groups". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report