In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "Python crawler regular expression common symbols and methods summary". In the daily operation, I believe that many people have doubts about the common symbols and methods summary of Python crawler regular expressions. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods, hoping to help you answer the doubts of "Python crawler regular expression common symbols and methods summary"! Next, please follow the editor to study!
Regular expressions are not part of Python. Regular expressions are powerful tools for dealing with strings, with their own unique syntax and an independent processing engine, which may not be as efficient as the methods that come with str, but they are very powerful. Thanks to this, the syntax of regular expressions is the same in languages that provide regular expressions, except that the number of grammars supported by different programming language implementations is different; but don't worry, unsupported syntax is usually an uncommonly used part.
1. Common symbols
. : matches any character except the newline character\ n
Match the previous character 0 times or infinitely
?: match the previous character 0 or 1 times
. *: greedy algorithm to match as many characters as possible
. *?: non-greedy algorithm
(): the data in parentheses is returned as a result
2. Common methods
Findall: matches all the content that conforms to the rules and returns a list containing the results
Search: matches and extracts the first content that conforms to the rule, and returns a regular expression object
Sub: replace the content that conforms to the rule and return the value after replacement
3. Use examples
(1)。 Match any character except the newline character\ n
Import re # Import re library file
A = 'xy123'
B = re.findall ('x..)
Print b
The printed result is: ['xy1'], each. Represents a placeholder
(2) an example of the use of * to match the previous character 0 or unlimited times
A = 'xyxy123'
B = re.findall ('Xerox girls, a)
Print b
The printed result is: ['x','','x,']
(3)? Match the previous character 0 or 1 times
A = 'xy123'
B = re.findall ('Xerox girls, a)
Print b
The printed result is: ['x',']
(4) examples of the use of.
Secret_code = 'hadkfalifexxIxxfasdjifja134xxlovexx23345sdfxxyouxx8dfse'
B = re.findall ('xx.*xx',secret_code)
Print b
The printed result is: ['xxIxxfasdjifja134xxlovexx23345sdfxxyouxx']
(5) *? Examples of use of
Secret_code = 'hadkfalifexxIxxfasdjifja134xxlovexx23345sdfxxyouxx8dfse'
C = re.findall ('xx.*?xx',secret_code)
Print c
The printed result is: ['xxIxx',' xxlovexx', 'xxyouxx']
(6) examples of the use of ()
Secret_code = 'hadkfalifexxIxxfasdjifja134xxlovexx23345sdfxxyouxx8dfse'
D = re.findall ('xx (. *?) xx',secret_code)
Print d
The printed result is: ['love',' you'], and the data in parentheses is used as the returned result
(7) examples of using re.S
S = 'sdfxxhello
Xxfsdfxxworldxxasdf'''
D = re.findall ('xx (. *?) xx',s,re.S)
Print d
The printed result is: ['hello\ n', 'world'], the function of re.S is to make. Include when matching\ n
(8) examples of using findall
S2 = 'asdfxxIxx123xxlovexxdfd'
F2 = re.findall ('xx (.) xx123xx (.) xx',s2)
Print f20
The printed result is: love
In this case, f2 is a list containing one tuple, which contains two elements. The two elements in the tuple are matched by two (). If S2 contains multiple substrings such as'xx (.) xx123xx (.) xx', then f2 contains multiple tuples.
(9) examples of using search
S2 = 'asdfxxIxx123xxlovexxdfd'
F = re.search ('xx (.) xx123xx (.) xx',s2) .group (2)
Print f
The printed result is: love
.group (2) returns the content matched by the second parenthesis. If it is .group (1), the print is: I.
(10) examples of using sub
S = '123rrrrrr123'
Output = re.sub ('123 (. *?) 123 (. *?) 123% d123% 789)
Print output
The printed result is: 123789123
The% d is similar to the% d in C language. If output=re.sub ('123 (. *?) 123 (. *?) 123 (123789123), the output is also: 123789123.
An example of the use of (11)\ d to match numbers
A = 'asdfasf1234567fasd555fas'
B = re.findall ('(\ d +)', a)
Print b
The printed result is: ['1234567', '555'],\ d + can match the numeric string
At this point, the study on "Python crawler regular expression common symbols and methods summary" is over, I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.