What are the basics of Python regular expressions 07/15 Update SLTechnology News&Howtos

What are the basics of Python regular expressions

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly talks about "what are the basics of Python regular expressions". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn the basics of Python regular expressions.

1. What is a regular expression

A regular expression (Regular Expressions), also known as "regex" or "regexp", uses a single string to describe and match a series of strings that match a syntactic rule, so that the program can match the pattern to any text string.

Using regular expressions, you can specify rules for a set of possible strings to match; this set may contain English sentences, email addresses, TeX commands, or anything you like

Regular expression engine

Using different algorithms, check the software module PCRE (perl compatible regular expressions) that processes regular expressions.

Metacharacter classification of regular expressions: character matching, matching times, position anchoring, grouping

The regular expression of Python is PCRE standard.

two。 Basis of regular expression

# character matching

. Match any single character (except newline characters)

[] matches any single character in the specified range: [0-9] [amurz]

^ [xxx] begins with any character in []

[^ xxx]: except for the characters in [], it is equivalent to inverting

# number of matches

Used to specify how many times the preceding character should appear

* match any number of previous characters, including 0, greedy matching: match as long as possible

. * any character of any length

? Match the character before it 0 or 1 times

+ match the previous characters at least once

{n} match the previous character n times

{mrecoery n} matches the preceding characters at least m times, and at most n times, such as {1pm 3}, 1 to 3 times.

{n,} match the previous characters at least n times

# position anchoring

# used to locate the location of the occurrence

^ the beginning of the line

$end of line

^ $blank line

# grouping

() grouping, bundling multiple characters together with () as a whole

Backward reference:\ 1,\ 2

# | or

A | b an or b

(a | a) bc Abc or abc

#\ escape

The backslash can be followed by various characters to indicate a variety of special sequences. It is also used to escape all metacharacters. So you can still match them in the pattern, and if you need to match [or\, you can add a backslash to remove their special meaning:\ or\.

It is recommended to use r to cancel escape in python, if you want to match'\ n', because'\ n 'has a special meaning: enter. To match'\ n 'you can write'\\ n', and you can use'\ n'in Python.

\ d matches any decimal number, equivalent to class [0-9]

\ w matches any alphanumeric characters including underscores; this is equivalent to the class [a-zA-Z0-9 _].

\ s matches any white space character; this is equivalent to the class [\ t\ n\ r\ f\ v].

\ D matches any non-numeric character; this is equivalent to class [^ 0-9].

\ W matches any non-alphanumeric characters; this is equivalent to the class [^ a-zA-Z0-9 _].

\ s matches any non-white space character; this is equivalent to the class [^\ t\ n\ r\ f\ v].

3. Using regular expressions in Python

The common module for using regular expressions in python is: re.

3.1 re's commonly used method # re.findall ('regular expression', 'text to be matched') matches all qualified data according to the rule Return list > re.findall ('[0-9]', "Hello world 123") ['1matching,' 2matching,'3'] # if findall does not match, return an empty list # re.finditer ('regular expression', 'text to be matched') to match all the eligible data according to the rule Return an object > re.finditer ('[0-9]', "Hello world 123") # values: > res = re.finditer ('[0-9]', "Hello world 123") > for i in res:... Print (i.group ())... If there is no match, finditer returns empty. # findall has the same functionality as finditer, but finditer saves more memory. # re.search ('regular expression', 'text to be matched') according to the regular match to a match, it ends > res = re.search ('Hello world 123') > res > res.group () 'lmatch # returns a match, and group () returns a specific element If None is returned by search, an error will be reported using the value of group: AttributeError: 'NoneType' object has no attribute' goup' # re.match ('regular expression', 'text to be matched') matches from the beginning according to the rule (the text content must be matched at the beginning) > res = re.match ("Hello world 123") # e is not at the beginning, so the match does not match. Return None > print (res) None > res = re.match ('hashes, "Hello world 123") # H can be matched at the beginning > print (res) > print (res.group ()) H# if it does not match, match will return None and will directly report an error when using the group value: "NoneType' object has no attribute" group'# search and match will report an error if they do not match You can deal with: if res: print (res.group ()) else: print ('not matched') # re.split segmentation > re.split ('[0-9]', "Hello 123 world") # return a list of ['Hello', 'world'] # re.sub (' match', 'replace') based on the matching partition 'text') replace > re.sub ('[0-9]', 'Aids, "Hello 123 world") # replace all by default' Hello AAA world' > re.sub ('[0-9]', 're.sub, "Hello 123 world", 1) # replace once' Hello A23 world' > re.sub ('[0-9]', 'Aids, "Hello 123 world" 2) # replace twice 'Hello AA3 world'# re.subn () has the same function as re.sub There is only ([0-9x]) in it, so print xxx'> > res = re.search ('[1-9] (\ d {16}) ([0-9x])' '37152119841105155x') > res.group () '37152119841105155x' > > res.group (1) # (\ d {16}) is the first grouping '7152119841105155155' > res.group (2) # ([0-9x]) is the second grouping. The value in this grouping is also the index value # findall gives priority to printing out the contents of the grouping > re.findall ('[1-9]\ d {16} ([0-9x])') '37152119841105155x') [' x'] # Ungrouping:?: > res = re.findall ('[1-9]\ d {16} (?: [0-9x])', '37152119841105155x') > > res # (?: [0-9x]) ['37152119841105155x'] # the above grouping is anonymous Groups can also have names. # famous grouping? P > res = re.search ('[1-9] (? P\ d {16}) (? P [0-9x])', '37152119841105155x') > res.group () '37152119841105155x' > > res.group ('first')' 7152119841105155'> > res.group ('second')' xkeeper # after grouping, you can also use the index to the value: > > res.group (2) 'xroom3.2 re module practice

Climb the first ten pages of Lianjia's second-hand house

Import reimport requests for i in range (1JI 11): url = 'https://bj.lianjia.com/ershoufang/pg{}srs%E6%97%A7%E5%AE%AB/'.format(i) r = requests.get (url) title = re.findall (' data-is_focus= "" data-sl= "> (. *?)', r.text) price = re.findall ('([0-9] {3})' R.text) address = re.findall ('data-log_index= "\ d" data-el= "region" > (. *?)', r.text) houseIcon = re.findall ('(. *?)', r.text) res = zip (title,price,address,houseIcon) for i in res: # print ("% s\ t% s (ten thousand)\ t% s\ t% s"% (I [0], I [1], I [2] I [3]) print ("Community name:% s, total price:% s, address:% s, room type:% s"% (I [0], I [1], I [2]) I [3]) # implementation result: [root@hans_tencent_centos82 module] # python3 houses.py Community name: Hongxinglou District north-south transparent one-bedroom full five unique commercial housing, total price: 1.85 million, address: Hongxing Lou Room type: 1 room and 1 living room | 46.46 square meters | South-North | simple installation | Top floor (6 floors) | built in 1989 | Banlou District name: old Palace North Lane, 2 rooms 2 Hall South-North, Total Price: 4.69 million, address: old Palace North Lane Room type: 2 rooms 2 living rooms | 96.55 square meters | hardcover | hardcover | High floor (a total of 9 floors) | built in 2004 | Banlou District name: 3 bedrooms and 1 living room on the north bank of Yizhuang, total price: 5.58 million, address: North bank of Yizhuang, room type: 3 bedrooms and 1 living room | 109.58 square meters | South-North | hardcover | Middle floor (total 15 floors) | built in 2008 | combined with plates and towers.

Reference content:

Python official documentation

Content extension:

Python's support for regular expressions

Python provides the re module to support regular expression-related operations, and here are the core functions in the re module.

Function description compile (pattern, flags=0) compiling regular expression returns regular expression object match (pattern, string, flags=0) returns matching object with regular expression matching string successfully returns matching object otherwise returns Nonesearch (pattern, string, flags=0) the pattern of the first regular expression in the search string returns matching object successfully, otherwise returns Nonesplit (pattern, string, maxsplit=0) Flags=0) splits the string with the pattern delimiter specified by the regular expression to return the list sub (pattern, repl, string, count=0, flags=0) replaces the pattern in the original string that matches the regular expression with the specified string. The number of replacements can be specified with count. Fullmatch (pattern, string, flags=0) match function exactly matches (from beginning to end of the string) version of findall (pattern, string) Flags=0) find strings all patterns that match regular expressions return a list of strings finditer (pattern, string, flags=0) find strings all patterns that match regular expressions return an iterator purge () clears the cache of implicitly compiled regular expressions re.I / re.IGNORECASE ignores case matching tags re.M / re.MULTILINE multiline matching tags

Note: the above mentioned re module in these functions, the actual development of regular expression objects can also be used to replace the use of these functions, if a regular expression needs to be used repeatedly, then first through the compile function to compile regular expressions and create a regular expression object is undoubtedly a more sensible choice.

At this point, I believe you have a deeper understanding of "what are the basics of Python regular expressions?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.