A detailed introduction to regular expression syntax 07/09 Update SLTechnology News&Howtos

A detailed introduction to regular expression syntax

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "detailed introduction of regular expression grammar". In daily operation, I believe many people have doubts about the detailed introduction of regular expression grammar. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts of "detailed introduction to regular expression grammar"! Next, please follow the editor to study!

Regular expression syntax

Characters and character classes

Special character:. ^ $? + * {} |

If you want to use the literal value of the above special characters, you must use the

Character class

1. One or more characters contained in [] are called character classes, and character classes match only one of them if no quantifier is specified.

two。 You can specify a range within a character class, for example, [a-zA-Z0-9] represents any character between an and Z.A to ZP0 to 9.

3. The left square bracket is followed by a negative character class, such as [0-9], which means that any non-numeric character can be matched.

4. Within the character class, except for, other special characters no longer have a special meaning and represent literal values. Put it in the first position to indicate negation, put it in another position to represent itself,-in the middle to represent the range, and the first character in the character class means-itself.

5. Shorthand can be used within character classes, such as d s w

3 shorthand

. You can match any character except a newline character, or, if there is a re.DOTALL flag, any character including newline

D matches a Unicode number, or 0-9 with re.ASCII

D match Unicode non-numeric

S matches Unicode whitespace, if there is re.ASCII, then one of the matches

S matching Unicode is not blank

W matches Unicode word characters, or if with re.ascii, matches one of [a-zA-Z0-9 _]

W matches Unicode non-single subcharacters

Quantifier

1.? Match the previous character 0 or 1 times

2. * match the previous character 0 or more times

3. + matches the previous character one or more times

4. {m} match the previous expression m times

5. {m,} match the previous expression at least m times

6. {, n} matches the previous regular expression up to n times

7. {mdirection n} matches the previous regular expression at least m times and n times at most

Note:

The above quantifiers are all greedy patterns and will match as much as possible. If you want to change to non-greedy mode, by following one after the quantifier? To realize it.

You will certainly encounter a lot of difficulties when learning python, as well as the pursuit of new technology. Here is our learning qun:784,758,214, which is the gathering place for python learners.

Group and capture

The role of ():

To capture the contents of the regular expression in () for further processing, you can turn off the capture function of this parenthesis by following?: after the left parenthesis

Combine parts of a regular expression to use quantifiers or |

The response refers to what was captured in the previous ():

Backreference by group number

Each parenthesis that does not use?: will be assigned a good group, starting from 1, incrementing from left to right, and you can use I to refer to the content captured by the expression in the preceding ().

Reverse reference the content captured in the previous parentheses through the group name

You can give an alias to a group by following the left parenthesis with the group name in the angle brackets, followed by (? P=name) to refer to the previously captured content. Such as (? Pw+) s + (? P=word) to match duplicate words.

Note:

Backreferences cannot be used in the character class [].

Assertion and marking

The assertion does not match any text, but only imposes some constraints on the text where the assertion is located

Commonly used assertions:

1. Match the boundary of the word, and put it in the character class [] to indicate backspace.

2. B matches non-word boundaries and is affected by ASCII tags

3. A match at the beginning

4. ^ match at the beginning, if there is a MULTILINE flag, after each newline character

5. Z matches at the end

6. $matches at the end or, if there is a MULTILINE flag, before each newline character

7. (? = e) positive outlook

8. (?! e) negative outlook

9. (? The explanation of prospective review

Foresight: the content behind exp1 (? = exp2) exp1 should match exp2.

Negative foresight: the content behind exp1 (?! exp2) exp1 does not match exp2.

Looking back: (? For example, we want to find hello, but hello must be followed by world. The regular expression can write: "(hello) s + (? = world)" to match "hello wangxing" and "hello world" can only match the hello of the latter.

Condition matching

(? (id) yes_exp | no_exp): if the child expression of the corresponding id matches the content, yes_exp is matched here, otherwise no_exp is matched.

Flags for regular expressions

There are two ways to use the flags of regular expressions

By passing flag parameters to the compile method, multiple flags use the | segmentation method, such as re.compile (r "# [da-f] {6}", re.IGNORECASE | re.MULTILINE)

By adding (?) before the regular expression. Add flags to regular expressions, such as (? ms) # [da-z] {6}

Commonly used signs

Re.An or re.ASCII, so that B s S w W d D assumes that the string is assumed to be ASCII

Re.I or re.IGNORECASE causes regular expressions to ignore case

Re.M or re.MULTILINE multiple lines match so that each ^ matches after each carriage return and each $before each carriage return

Re.S or re.DOTALL makes. Can match any character, including enter

Re.X or re.VERBOSE can span multiple lines in a regular expression, or add comments, but white space needs to be represented by s or [], because the default white space is no longer interpreted.

2. Python regular expression module

Regular expressions have four main functions in dealing with strings.

1. Match to check whether a string conforms to the syntax of a regular expression, which usually returns true or false

two。 Get a regular expression to extract the text that meets the requirements in a string

3. Replace the text in the string that matches the regular expression and replace it with the corresponding string

4. Segmentation uses regular expressions to split strings.

Two methods of using regular expressions in re Module in Python

1. Use the re.compile (r, f) method to generate the regular expression object, and then call the corresponding method of the regular expression object. The advantage of this approach is that it can be used multiple times after generating regular objects.

2.Each object method for the regular expression object in the re module has a corresponding module method, except that the first parameter passed in is the regular expression string. This method is suitable for regular expressions that are used only once.

Common methods of regular expression objects

1. Rx.findall (sfocus start, end):

Returns a list that contains all matches if there is no grouping in the regular expression

If there is a grouping in the regular expression, each element in the list is a tuple that contains what is matched in the subgroup, but does not return the entire regular expression match

2. Rx.finditer (s, start, end):

Returns an iterable object

Iterate the iterable object and return one matching object each time. You can call the group () method of the matching object to view the matching content of the specified group, and 0 represents the matching content of the entire regular expression.

3. Rx.search (s, start, end):

Returns a matching object. If no match is reached, None is returned.

The search method will stop if it matches only once, and will not continue to match later.

4. Rx.match (s, start, end):

If the regular expression matches at the beginning of the string, a matching object is returned, otherwise None is returned.

5. Rx.sub (x, s, m):

Returns a string. Each matching place is replaced with x, and the replaced string is returned. If m is specified, the maximum number of substitutions is m. For x, you can use / I or / gid can be a group name or number to refer to the captured content.

The x in the module method re.sub (r, x, s, m) can use a function. At this point, we can push the captured content through this function and then replace the matching text.

6. Rx.subn (x, s, m):

The same as the re.sub () method, except that it returns a binary, one of which is the result string and the other is the number of replacements.

7. Rx.split (s, m): split the string

Return a list

Split the string with the content matched by the regular expression

If there is a grouping in the regular expression, the content that the grouping matches is placed in the middle of every two divisions in the list as part of the list, such as:

8. Rx.flags (): flag set when regular expressions are compiled

9. Rx.pattern (): the string used when the regular expression is compiled

Match the properties and methods of the object

01. M.group (g, …)

Returns the content to which the number or group name matches. Default or 0 indicates the content matched by the entire expression. If more than one is specified, a tuple is returned.

02. M.groupdict (default)

Returns a dictionary. The key of the dictionary is the group name of all named groups, and the value is the content captured by the named group.

If there is a default parameter, it is used as the default value for groups that do not participate in the match.

03. M.groups (default)

Returns a tuple. Contains all subgroups of captured content, starting from 1, if the default value is specified, this value is used as the value for those groups that do not capture the content

04. M.lastgroup ()

The name of the highest numbered capture group that matches to the content, and returns None if there is no name or no name is used (rarely used)

05. M.lastindex ()

Matches the number of the highest numbered capture group to the content, and returns None if not.

06. M.start (g):

The subgrouping of the current matching object starts at that position in the string, and returns-1 if the current group does not participate in the matching.

07. M.end (g)

The subgrouping of the current matching object ends at that position of the string, and returns-1 if the current group does not participate in the match.

08. M.span ()

Returns a binary with the return values of m.start (g) and m.end (g), respectively.

09. M.re ()

A regular expression that produces this matching object

10. M.string ()

The string passed to match or search for matching

11. M.pos ()

The starting position of the search. That is, the beginning of the string, or the location specified by start (not commonly used)

12. M.endpos ()

The end of the search. That is, the position at the end of the string, or the position specified by end (not commonly used)

Summary

1. For the matching function of regular expressions, Python does not return true and false methods, but it can be judged by whether the return value of the match or search method is None

two。 For the search function of regular expressions, if you search only once, you can use the matching object returned by the search or match method to get it, and for multiple searches, you can use the iterable object returned by the finditer method to iterate.

3. The replacement function of regular expression can be realized by using the sub or subn method of the regular expression object, or by the re module method sub or subn. The difference is that the replacement text of the sub method of the module can be generated by a function.

4. For the regular expression segmentation function, you can use the split method of the regular expression object. It is important to note that if the regular expression object is grouped, the content captured by the grouping will also be placed in the returned list.

At this point, the study of "detailed introduction to regular expression grammar" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.