In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly introduces "detailed introduction of regular expression grammar". In daily operation, I believe many people have doubts about the detailed introduction of regular expression grammar. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts of "detailed introduction to regular expression grammar"! Next, please follow the editor to study!
Regular expression syntax
Characters and character classes
Special character:. ^ $? + * {} |
If you want to use the literal value of the above special characters, you must use the
Character class
1. One or more characters contained in [] are called character classes, and character classes match only one of them if no quantifier is specified.
two。 You can specify a range within a character class, for example, [a-zA-Z0-9] represents any character between an and Z.A to ZP0 to 9.
3. The left square bracket is followed by a negative character class, such as [0-9], which means that any non-numeric character can be matched.
4. Within the character class, except for, other special characters no longer have a special meaning and represent literal values. Put it in the first position to indicate negation, put it in another position to represent itself,-in the middle to represent the range, and the first character in the character class means-itself.
5. Shorthand can be used within character classes, such as d s w
3 shorthand
. You can match any character except a newline character, or, if there is a re.DOTALL flag, any character including newline
D matches a Unicode number, or 0-9 with re.ASCII
D match Unicode non-numeric
S matches Unicode whitespace, if there is re.ASCII, then one of the matches
S matching Unicode is not blank
W matches Unicode word characters, or if with re.ascii, matches one of [a-zA-Z0-9 _]
W matches Unicode non-single subcharacters
Quantifier
1.? Match the previous character 0 or 1 times
2. * match the previous character 0 or more times
3. + matches the previous character one or more times
4. {m} match the previous expression m times
5. {m,} match the previous expression at least m times
6. {, n} matches the previous regular expression up to n times
7. {mdirection n} matches the previous regular expression at least m times and n times at most
Note:
The above quantifiers are all greedy patterns and will match as much as possible. If you want to change to non-greedy mode, by following one after the quantifier? To realize it.
You will certainly encounter a lot of difficulties when learning python, as well as the pursuit of new technology. Here is our learning qun:784,758,214, which is the gathering place for python learners.
Group and capture
The role of ():
To capture the contents of the regular expression in () for further processing, you can turn off the capture function of this parenthesis by following?: after the left parenthesis
Combine parts of a regular expression to use quantifiers or |
The response refers to what was captured in the previous ():
Backreference by group number
Each parenthesis that does not use?: will be assigned a good group, starting from 1, incrementing from left to right, and you can use I to refer to the content captured by the expression in the preceding ().
Reverse reference the content captured in the previous parentheses through the group name
You can give an alias to a group by following the left parenthesis with the group name in the angle brackets, followed by (? P=name) to refer to the previously captured content. Such as (? Pw+) s + (? P=word) to match duplicate words.
Note:
Backreferences cannot be used in the character class [].
Assertion and marking
The assertion does not match any text, but only imposes some constraints on the text where the assertion is located
Commonly used assertions:
1. Match the boundary of the word, and put it in the character class [] to indicate backspace.
2. B matches non-word boundaries and is affected by ASCII tags
3. A match at the beginning
4. ^ match at the beginning, if there is a MULTILINE flag, after each newline character
5. Z matches at the end
6. $matches at the end or, if there is a MULTILINE flag, before each newline character
7. (? = e) positive outlook
8. (?! e) negative outlook
9. (? The explanation of prospective review
Foresight: the content behind exp1 (? = exp2) exp1 should match exp2.
Negative foresight: the content behind exp1 (?! exp2) exp1 does not match exp2.
Looking back: (? For example, we want to find hello, but hello must be followed by world. The regular expression can write: "(hello) s + (? = world)" to match "hello wangxing" and "hello world" can only match the hello of the latter.
Condition matching
(? (id) yes_exp | no_exp): if the child expression of the corresponding id matches the content, yes_exp is matched here, otherwise no_exp is matched.
Flags for regular expressions
There are two ways to use the flags of regular expressions
By passing flag parameters to the compile method, multiple flags use the | segmentation method, such as re.compile (r "# [da-f] {6}", re.IGNORECASE | re.MULTILINE)
By adding (?) before the regular expression. Add flags to regular expressions, such as (? ms) # [da-z] {6}
Commonly used signs
Re.An or re.ASCII, so that B s S w W d D assumes that the string is assumed to be ASCII
Re.I or re.IGNORECASE causes regular expressions to ignore case
Re.M or re.MULTILINE multiple lines match so that each ^ matches after each carriage return and each $before each carriage return
Re.S or re.DOTALL makes. Can match any character, including enter
Re.X or re.VERBOSE can span multiple lines in a regular expression, or add comments, but white space needs to be represented by s or [], because the default white space is no longer interpreted.
2. Python regular expression module
Regular expressions have four main functions in dealing with strings.
1. Match to check whether a string conforms to the syntax of a regular expression, which usually returns true or false
two。 Get a regular expression to extract the text that meets the requirements in a string
3. Replace the text in the string that matches the regular expression and replace it with the corresponding string
4. Segmentation uses regular expressions to split strings.
Two methods of using regular expressions in re Module in Python
1. Use the re.compile (r, f) method to generate the regular expression object, and then call the corresponding method of the regular expression object. The advantage of this approach is that it can be used multiple times after generating regular objects.
2.Each object method for the regular expression object in the re module has a corresponding module method, except that the first parameter passed in is the regular expression string. This method is suitable for regular expressions that are used only once.
Common methods of regular expression objects
1. Rx.findall (sfocus start, end):
Returns a list that contains all matches if there is no grouping in the regular expression
If there is a grouping in the regular expression, each element in the list is a tuple that contains what is matched in the subgroup, but does not return the entire regular expression match
2. Rx.finditer (s, start, end):
Returns an iterable object
Iterate the iterable object and return one matching object each time. You can call the group () method of the matching object to view the matching content of the specified group, and 0 represents the matching content of the entire regular expression.
3. Rx.search (s, start, end):
Returns a matching object. If no match is reached, None is returned.
The search method will stop if it matches only once, and will not continue to match later.
4. Rx.match (s, start, end):
If the regular expression matches at the beginning of the string, a matching object is returned, otherwise None is returned.
5. Rx.sub (x, s, m):
Returns a string. Each matching place is replaced with x, and the replaced string is returned. If m is specified, the maximum number of substitutions is m. For x, you can use / I or / gid can be a group name or number to refer to the captured content.
The x in the module method re.sub (r, x, s, m) can use a function. At this point, we can push the captured content through this function and then replace the matching text.
6. Rx.subn (x, s, m):
The same as the re.sub () method, except that it returns a binary, one of which is the result string and the other is the number of replacements.
7. Rx.split (s, m): split the string
Return a list
Split the string with the content matched by the regular expression
If there is a grouping in the regular expression, the content that the grouping matches is placed in the middle of every two divisions in the list as part of the list, such as:
8. Rx.flags (): flag set when regular expressions are compiled
9. Rx.pattern (): the string used when the regular expression is compiled
Match the properties and methods of the object
01. M.group (g, …)
Returns the content to which the number or group name matches. Default or 0 indicates the content matched by the entire expression. If more than one is specified, a tuple is returned.
02. M.groupdict (default)
Returns a dictionary. The key of the dictionary is the group name of all named groups, and the value is the content captured by the named group.
If there is a default parameter, it is used as the default value for groups that do not participate in the match.
03. M.groups (default)
Returns a tuple. Contains all subgroups of captured content, starting from 1, if the default value is specified, this value is used as the value for those groups that do not capture the content
04. M.lastgroup ()
The name of the highest numbered capture group that matches to the content, and returns None if there is no name or no name is used (rarely used)
05. M.lastindex ()
Matches the number of the highest numbered capture group to the content, and returns None if not.
06. M.start (g):
The subgrouping of the current matching object starts at that position in the string, and returns-1 if the current group does not participate in the matching.
07. M.end (g)
The subgrouping of the current matching object ends at that position of the string, and returns-1 if the current group does not participate in the match.
08. M.span ()
Returns a binary with the return values of m.start (g) and m.end (g), respectively.
09. M.re ()
A regular expression that produces this matching object
10. M.string ()
The string passed to match or search for matching
11. M.pos ()
The starting position of the search. That is, the beginning of the string, or the location specified by start (not commonly used)
12. M.endpos ()
The end of the search. That is, the position at the end of the string, or the position specified by end (not commonly used)
Summary
1. For the matching function of regular expressions, Python does not return true and false methods, but it can be judged by whether the return value of the match or search method is None
two。 For the search function of regular expressions, if you search only once, you can use the matching object returned by the search or match method to get it, and for multiple searches, you can use the iterable object returned by the finditer method to iterate.
3. The replacement function of regular expression can be realized by using the sub or subn method of the regular expression object, or by the re module method sub or subn. The difference is that the replacement text of the sub method of the module can be generated by a function.
4. For the regular expression segmentation function, you can use the split method of the regular expression object. It is important to note that if the regular expression object is grouped, the content captured by the grouping will also be placed in the returned list.
At this point, the study of "detailed introduction to regular expression grammar" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.