Python pattern matching and the use of regular expressions 07/01 Update SLTechnology News&Howtos

Python pattern matching and the use of regular expressions

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "python pattern matching and the use of regular expressions". In daily operation, I believe that many people have doubts about the use of python pattern matching and regular expressions. I have consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts about "python pattern matching and the use of regular expressions". Next, please follow the editor to study!

All the regular expression functions in python are in the re module.

Pass a string value to re.complie () to represent the regular expression, which returns a Regex schema object.

The serch () method of the Regex object looks for the incoming string, looking for all matches of the regular expression. If the regular expression pattern is not found in the string, the search () method returns None. If the pattern is found, the search () method returns a Match object. The Match object has a group () method that returns the actual matching text in the found string.

* regular expression matching review *

1. Import regular expression module 2 with import re. Create a Regex object with the re.compile () function (remember to use the original string) 3. Pass the string you want to find to the search () method of the Regex object. It returns a Match object. 4. Call the group () method of the Match object to return the actual matching text string.

* grouped in parentheses *

Suppose you want to separate the area code from the phone number. Adding parentheses creates a "grouping" in the regular expression: (\ d\ d\ d)-(\ d\ d\ d -\ d\ d). Then use the group () match object method to get the matching text from a grouping. The first pair of parentheses in a regular expression string is the first group. The second pair of parentheses is the second group. Pass the integer 1 or 2 to the group () match object method to match different parts of the text. Passing 0 or no argument to the group () method returns the entire matching text.

> import re > phoneNumberRegex = re.compile (r'(\ d\ d\ d)-(\ d\ d\ d -\ d\ d\ d)') > mo = phoneNumberRegex.search ('my number is 415-555-4242.') > > mo.group (1) '415' > > mo.groups () ('415 packets,' 555-424') if you want to get all the packets at once, use the groups () method character "|" as a pipe. You can use it when you want to match one of many expressions. If two or more of the strings you are looking for appear, the matching text that first appears will be returned as a Match object. > heroRegex = re.compile (r'batman | tina fey') > mo1 = heroRegex.search ('batman and tina fey') > mo1.group ()' batman' > mo1 = heroRegex.search ('tina fey and batman') > mo1.group ()' tina fey'

You can try pipes to match one of multiple patterns as part of a regular expression.

* use question marks to achieve optional matching *

Sometimes, matching patterns are optional. That is, whether the text is there or not, the regular expression will assume that it matches. Characters? Indicates that the grouping in front of it is optional in this mode.

> phoneNumberRegex = re.compile (r'(\ d\ d\ d -)?\ d\ d\ d -\ d\ d\ d) > mo = phoneNumberRegex.search ('my number is 415,555-4242.') > mo.group () '415,555-424' > mo = phoneNumberRegex.search (' my number is 555-4242.') > > mo.group () '555-424'

In regular expressions (\ d\ d\ d -)? The section indicates that the mode (\ d\ d\ d -) is optional. That is, zero or one grouping before matching this question mark.

* match zero or more times with asterisks *

The grouping before the "*" asterisk can appear any number of times in the text.

* use the plus sign to match one or more times *

The partition before the "+" plus sign appears at least once in the text

* match a specific number of times with curly braces *

If you want a grouping to repeat a specific number of times, follow the grouping in the regular expression, followed by the number surrounded by parentheses. For example, the regular expression (Ha) {3} will match the string 'HaHaHa', in addition to a number, can also be assigned to a range, that is, write a minimum value, a comma, and a maximum value in curly braces. For example, in the regular expression (Ha) {3 Ha 5} will match 'HaHaHa','HaHaHaHa','HaHaHaHaHaHa' or you can leave the first or second number in curly braces without limiting the minimum or maximum value. For example, (Ha) {3,} will match 3 or more instances, and (Ha) {, 5} will match 0 to 5 instances.

* greedy matching and non-greedy matching *

Python's regular expression pattern is "greedy", which means that they will match the longest string as much as possible. The "non-greedy" version of curly braces matches the shortest possible string, that is, the closing curly braces are followed by a question mark.

Question marks have two meanings in regular expressions: declare a non-greedy match or indicate an optional grouping.

* findall () method *

In addition to the search method, the Regex object also has a findall () method. Search () returns a Match object containing the "first" matching text in the looked up string, while the findall () method returns a set of strings containing all matches in the looked up string.

If the call is on a regular expression that is not grouped, for example,\ d\ d, the method findall () returns a list of matching strings, for example, if the call is on a regular expression that is grouped, for example, (\ d\ d)-(\ d\ d) The method findall () returns a list of tuples of a string (each grouping corresponds to a string), such as [('415, 555, 555, 1121), (212, 555, 0000)]

* character classification *

The abbreviation character classification represents any number from\ d0 to 9\ D any character except the number from 0 to 9\ w any letter, number or underscore character (can be considered to match "word" character\ W any character other than letters, numbers and underscores\ s spaces, tabs or newline characters (can be considered to match "blank" characters)\ s any character except spaces, tabs and newline characters

* create your own character classification *

Define your own character classification in square brackets. For example, the character classification [adiouAEIOU] will match all reason characters, regardless of case. The short dash of Shi Yonghong can also indicate the range of letters or numbers. For example, [0-5] matches only the numbers 0 to 5 in square brackets, ordinary regular expression symbols are not interpreted. You can get a "non-character class" by adding an insert character (^) after the left square bracket of the character classification. Non-character classes will match all characters that are not in this character class. # matches all non-vowel characters > consonantRegex = re.compile (r'[^ aeiouAEIOU]') > consonantRegex.findall ('RoboCop eats baby food. BABY FOOD.') ['baked,' baked, 'cased,' paired,', 'tweeted,' sworn,'', 'baked,' baked, 'yawned,', 'faded,' dusted, 'baked,' baked, 'Yee,', 'Fried,'

* insert characters and dollar characters *

You can use a caret (^) at the beginning of a regular expression to indicate that the match must occur at the beginning of the found text. Similarly, you can add a dollar sign () to the end of a regular expression to indicate that the string must end in the pattern of the regular expression. You can use both and to indicate that the entire string must match the pattern, that is, it is not enough to match only a subset of the string.

> beginsWithHello = re.compile (r'^ Hello') > beginsWithHello.search ('Hello worldview') 'Hello' > hh=beginsWithHello.search (' Hello Worldwide') > > hh.group () 'Hello' > endsWithNumber = re.compile (r'\ dbath') > ss=endsWithNumber.search ('Your number is 42') > ss.group ()' 2' > wholeStringIsNum = re.compile (r'^\ dbath') > > rr=wholeStringIsNum.search ('1234567890') > rr.group ()' 1234567890'

* wildcard characters *

In regular expressions,. (period) characters are called wildcards. It matches all characters except line breaks.

> atRegex = re.compile (ritual. At') > atRegex.findall ('The cat in the hat sat on the flat mat.') [' cat', 'hat',' sat', 'lat',' mat']

Period characters match only one character, which is why in the previous example, for the text flat, only lat is matched.

* match all characters with dot-star *

Sometimes you want to match all strings. For example, suppose you want to match the string 'First Name:',' followed by arbitrary text, followed by 'Last Name:', then arbitrary text. You can use dot-star (. *) to represent "arbitrary text". Recall that the period character represents "all single characters except newline" and the asterisk character indicates "zero or more occurrences of the preceding character".

> nameRegex = re.compile (r'First Name: (. *) Last Name: (. *)') > mo = nameRegex.search ('First Name: Al Last Name: Sweigart') > mo.group (1)' Al' > mo.group (2) 'Sweigart'

* match line breaks with period characters *

The dot-star will match all characters except line breaks. By passing in re.DOTALL as the second argument to re.compile (), you can make the period character match all characters, including newline characters.

> noNewlineRegex = re.compile ('. *') > noNewlineRegex.search ('Serve the public trust.\ nProtect the innocent.\ nUphold the law.'). Group ()' Serve the public trust.' > newlineRegex = re.compile ('. *', re.DOTALL) > newlineRegex.search ('Serve the public trust.\ nProtect the innocent.\ nUphold the law.'). Group ()' Serve the public trust.\ nProtect the innocent.\ nUphold the law.'

* regular expression symbol review *

? Match zero or one previous grouping. * match the previous groups for zero or more times. + match one or more previous groups. {n} matches the previous grouping n times. {n,} match n or more previous packets. {, m} matches the packets before zero to m times. {nrecoery m} matches at least n times and at most m times the previous packets. {n,m}? Or *? Or +? Non-greedy matching is performed on the previous grouping. ^ spam means that the string must start with spam. Spam$ means that the string must end with spam. . Matches all characters except newline characters. \ d,\ w, and\ s match numbers, words, and spaces, respectively. \ D,\ W, and\ S match all characters except numbers, words, and spaces, respectively. [abc] matches any character (such as a, b, or c) in formula brackets. [^ abc] matches any character that is not in square brackets.

Insensitive to case

Pass re.IGNORECASE or re.I to re.comlile () as the second parameter

* replace the string with the sub () method

The sub () method of the Regex object needs to pass in two parameters. The first parameter is a string that replaces the found match. The second parameter is a string, the regular expression. The sub () method returns the string after the replacement is completed.

> namesRegex = re.compile (r'Agent\ Agent Alice gave the secret documents to Agent Bob.'') > namesRegex.sub ('CENSORED',' Agent Alice gave the secret documents to Agent Bob.') 'CENSORED gave the secret documents to CENSORED.'

Sometimes, you may need to use the matching text itself as part of the replacement. In the first parameter of sub (), you can enter\ 1,\ 2,\ 3. Enter grouping 1, 2, 3 in substitution. Text.

For example, suppose you want to hide the names of spies and display only the first letter of their names. To do this, you can use the regular expression Agent (\ w)\ sub, passing in r'\ 1' as the first parameter of the expression. \ 1 in the string will be replaced by text matched by grouping 1, that is, the (\ w) grouping of regular expressions.

> agentNamesRegex = re.compile (r'Agent (\ w)\ walled') > agentNamesRegex.sub (r'\ 1mm / s), 'Agent Alice told Agent Carol that AgentEve knew Agent Bob was a double agent.') A / S / told / C / 2 / 2 / 3 / 2 / 3 / 2 / 3 / 2 / 3 / 2 / 2 / 2 / 3 / 2 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3 / 3)

If the text pattern to match is simple, the regular expression is fine. But matching complex text patterns may require long, obscure regular expressions. You can ease this by telling re.compile () to ignore white space and comments in the regular expression string. To implement this verbose pattern, you can pass the variable re.VERBOSE to re.compile () as the second parameter.

The project phone number and E-mail address extractor assumes that you have a boring task of finding all phone numbers and email addresses in a long web page or article. If you turn the page manually, it may take a long time to find it. If you have a program that looks up phone numbers and E-mail addresses in the clipboard text, just click Ctrl-A to select all the text, press Ctrl-C to copy it to the clipboard, and then run your program. It replaces the text in the clipboard with the phone number and E-mail address found.

Import reimport pyperclipphoneRegex = re.compile ((\ d {3} |\ (\ d {3}\))? (\ s\ |-|\ d {3}) (\ d {3}) (\ s | -) (\ d {4}) (\\ s * (ext | x | ext.)\ s * (\ d {2pm 5}))?)'' Re.VERBOSE) emailRegex = re.compile (a-zA-Z0-9.% join -] + @ [a-zA-Z0-9. -] + (\. [a-zA-Z] {2mer4}))'', re.VERBOSE) text = str (pyperclip.paste ()) matches = [] for groups in phoneRegex.findall (text): phoneNum ='- '.join ([groups [1], groups [3]) Groups [5]]) if groups [8]! ='': phoneNum + ='x' + groups [8] matches.append (phoneNum) for groups in emailRegex.findall (text): matches.append (groups [0]) if len (matches) > 0: pyperclip.copy ('\ n'.join (matches)) print ('Copied to clipboard:') print ('\ n'.join (matches)) else: print ('No phone numbers or email addresses found.') At this point, the study on "python pattern matching and the use of regular expressions" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.