In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "how to learn Python regular expression". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to learn Python regular expression".
Brief introduction
Regular expressions (regular expression) are patterns that match text snippets. The simplest regular expression is a normal string that matches itself. For example, the regular expression 'hello' can match the string' hello'.
Re module
In Python, we can use the built-in re module to use regular expressions.
Use the compile function to compile the string form of a regular expression into a Pattern object
Match the text through a series of methods provided by the Pattern object to get the matching result (a Match object)
* use the properties and methods provided by the Match object to obtain information and perform other operations as needed
Compile function
The compile function is used to compile regular expressions to generate a Pattern object, which is generally used in the following forms:
The match method is used to find the header of a string (you can also specify the starting position), which is a match and is returned as soon as a matching result is found, rather than finding all matching results. Its general use is as follows:
Match (string [, pos [, endpos]])
Where string is the string to be matched, and pos and endpos are optional parameters that specify the start and end positions of the string. The default values are 0 and len (string length), respectively. Therefore, when you do not specify pos and endpos, the match method matches the header of the string by default.
Above, a Match object is returned when the match is successful, where:
Group ([group1, …]) Method is used to get one or more grouped matching strings, and when you want to get the entire matching substring, you can directly use group () or group (0)
The start ([group]) method is used to obtain the starting position of the grouped matching substring in the entire string (the index of the substring * characters). The default value of the parameter is 0.
The end ([group]) method is used to obtain the end position of the grouped matching substring in the entire string (substring * * one character index + 1). The default value of the parameter is 0.
The span ([group]) method returns (start (group), end (group)).
Search method
The search method is used to find any position in a string. It is also a match, and is returned as soon as a matching result is found, instead of finding all matching results. Its general usage is as follows:
Search (string [, pos [, endpos]])
Where string is the string to be matched, and pos and endpos are optional parameters that specify the start and end positions of the string. The default values are 0 and len (string length), respectively.
Findall method
Both the match and search methods above match once and are returned as soon as a matching result is found. Most of the time, however, we need to search the entire string and get all the matching results.
The findall method is used as follows:
Findall (string [, pos [, endpos]])
Where string is the string to be matched, and pos and endpos are optional parameters that specify the start and end positions of the string. The default values are 0 and len (string length), respectively.
Findall returns all the matching substrings in the form of a list, or an empty list if there is no match.
Look at the example:
Import re
Pattern = re.compile (ringing numbers') # find numbers
Result1 = pattern.findall ('hello 123456 789')
Result2 = pattern.findall ('one1two2three3four4', 0,10)
Print result1
Print result2
Execution result:
['123456', '789']
['1mm,' 2']
Finditer method
The behavior of the finditer method is similar to that of findall, which searches the entire string and gets all the matching results. But it returns an iterator that sequentially accesses each matching result (Match object).
Split method
The split method splits the string into substrings that can match and returns the list. It is used in the following form:
Split (string [, maxsplit])
Maxsplit is used to specify the number of * splits, but not to specify that all of them will be split.
Look at the example:
If repl is a string, repl is used to replace each matching substring of the string and the replaced string is returned. In addition, repl can also refer to a grouping in the form of id, but cannot use the number 0
If repl is a function, this method should take only one parameter (the Match object) and return a string for substitution (grouping can no longer be referenced in the returned string).
Count is used to specify the maximum number of replacements, and replace them all if not specified.
It returns a tuple:
(sub (repl, string [, count]), number of substitutions)
The tuple has two elements, the * element is the result of using the sub method, and the second element returns the number of times the original string has been replaced.
Look at the example:
In fact, a series of methods for Pattern objects generated using the compile function correspond to most of the functions of the re module, but with slight differences in use.
Match function
The match function is used in the following form:
Search function
The search function is used in the following form:
Re.search (pattern, string [, flags])
The search function cannot specify the search interval of a string, and the usage is similar to the search method of the Pattern object.
Findall function
The findall function is used in the following form:
Re.findall (pattern, string [, flags])
The findall function cannot specify the search interval of a string, and the usage is similar to the findall method of the Pattern object.
Look at the example:
Import re
Print re.findall (hello 12345 789)
# output
['12345', '789']
Finditer function
The finditer function is used in a way similar to Pattern's finditer method, in the following form:
Re.finditer (pattern, string [, flags])
Split function
The split function is used in the following form:
Re.split (pattern, string [, maxsplit])
Sub function
The sub function is used in the following form:
Re.sub (pattern, repl, string [, count])
Subn function
The subn function is used in the following form:
Re.subn (pattern, repl, string [, count])
In which way?
As you can see from the above, there are two ways to use the re module:
Use the re.compile function to generate a Pattern object, and then use a series of methods of the Pattern object to match the text
Directly use functions such as re.match, re.search and re.findall to search for text matches.
Next, let's use an example to show these two methods.
Let's first look at the first usage:
Import re
# compile regular expressions into Pattern objects first
Pattern = re.compile (rattled')
Print pattern.match ('123,123')
Print pattern.search ('234,234')
Print pattern.findall ('345,345')
Take a look at the second usage:
Import re
Print re.match (rusted ditches, '123,123')
Print re.search (rusted dudes, '234,234')
Print re.findall (rattling, '345,345')
If a regular expression needs to be used multiple times (such as d + above), it is often used in many situations. For the sake of efficiency, we should precompile the regular expression to generate a Pattern object, and then use a series of methods of the object to match the files that need to be matched. If you directly use re.match, re.search and other functions, each time a regular expression is passed in, it will be compiled once, and the efficiency will be greatly reduced.
Therefore, we recommend using the first usage.
Match Chinese
In some cases, we want to match the Chinese characters in the text, it should be noted that the Chinese unicode coding range is mainly in [u4e00-u9fa5], this is mainly because this range is incomplete, for example, it does not include full-width (Chinese) punctuation, but in most cases, it should be sufficient.
Suppose you now want to extract the Chinese from the string title = u 'Hello, hello, World', you can do this:
#-*-coding: utf-8-*-
Import re
Title = u 'Hello, hello, World'
Pattern = re.compile (ur' [\ u4e00 -\ u9fa5] +')
Result = pattern.findall (title)
Print result
Notice that we preceded the regular expression with two prefixes ur, where r indicates the use of the original string and u indicates the unicode string.
Execution result:
[u'\ u4f60\ u597dink, u'\ u4e16\ u754c']
Greedy matching
In Python, the default for regular matching is greedy matching (maybe not greedy in a few languages), that is, matching as many characters as possible.
For example, we want to find all the div blocks in the string:
Import re
Content = 'aatest1bbtest2cc'
Pattern = re.compile (ritual.
Result = pattern.findall (content)
Print result
Execution result:
['test1bbtest2']
Since regular matching is a greedy match, that is, as many matches as possible, there are * matches after successful matching.
It also tries to match to the right to see if there are longer substrings that can successfully match.
If we want to make a non-greedy match, we can add one as follows:
Import re
Content = 'aatest1bbtest2cc'
Pattern = re.compile (ritual. Thanks') # plus?
Result = pattern.findall (content)
Print result
Results:
['test1',' test2']
Thank you for your reading, the above is the content of "how to learn Python regular expression", after the study of this article, I believe you have a deeper understanding of how to learn Python regular expression, the specific use of the situation also needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.