In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the knowledge of "how to understand the regular expression of Python crawler". Many people will encounter this dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Metacharacter
^ $* +. |? {} [] () this is metacharacter. Learning these should be enough for you.
Regular expressions in python are used through import re.
1. Python crawler regular expression, [] is often used to specify a character set, such as [abc]; [amurz] all the letters in it will be matched one by one:
All the examples have been explained clearly, so I won't repeat them line by line. Note: in the example, ^ means reverse.
[amurz] represents all the letters from the letters a to z.
[0-9] is equivalent to [0123456789] and can also be represented by d. All other metacharacters lose their original meaning in []. For example, ^ in the example is reversed in [].
2. ^ represents the beginning of the matching string. Matches the beginning of each line in multiline mode.
Note: ^ is usually placed at the beginning of the string
3. $indicates the end of the matching string. Matches the tail of each line in multiline mode.
Note: $is usually placed at the end of the string.
The above three can be regarded as a small piece, do you remember? Do you understand? Do you understand? And did you type the code yourself?! Be sure to try to type the code yourself!
Okay, next, since metacharacters are special characters, what if we want to match the metacharacters themselves? When we want to turn metacharacters into normal symbols, we can use (backslash) to escape.
4. Different characters can be added after the backslash to express special meaning. It can also be used to cancel all metacharacters and become normal symbols.
As long as you can remember the other things that are black and bold, I bet you will have a full meeting, too. So remember to bold and type the following code again. Example
Regular expressions can match character sets of variable length, and you can also specify the number of repetitions of a string.
* (asterisk) specifies that the previous character can be matched 0 or more times instead of once, and the matching result will be repeated as many times as possible * 2 billion times. (if you add a question mark at the end? Change to non-greedy mode to match only 0 times: ab*? The result is a)
+ (plus sign) matches the previous character one or more times. (if you add a question mark at the end? Change to non-greedy mode to match only once: ab+? The result is ab)
? (question mark) matches the previous character 0 or 1 times. (if you add a question mark at the end? Change to non-greedy mode to match only 0 times: ab?? The result is a)? The original greedy mode of python can be changed into non-greedy mode.
{m} (curly braces) m is a number that indicates that the previous character is repeated m times.
{mdirection n} means to repeat the previous character mmern times. Omitting m means 0 mi n times, and omitting n means m to * * times. (if you add a question mark at the end? Change to non-greedy mode to match only 0 times: ab {2100}? The result is abb)
() |. . It matches any character except newline characters, and can even match newline characters in alternate mode (re.DOTALL)
| indicates that the left and right expressions match arbitrarily. A | b matches an or b. If it has not been (.) The scope of it is the entire regular expression.
(.) Grouping regular expressions, each as a whole, will give priority to returning the data within the group
That's all for "how to understand regular expressions for Python crawlers". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.