Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use regular expressions in Python

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

How to use regular expressions in Python? I believe that many inexperienced people are at a loss about this, so this article summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Commonly used regular symbols

Before entering string matching, let's take a look at the common regular symbols, as shown in the following table:

If readers can master the contents of the above table more skillfully, I believe they will be good at string processing. As mentioned earlier, this section will query, replace, and split strings based on regular expressions, all of which need to be imported into the re module and use several functions described below.

String matching query

The findall function in the re module can traverse the specified string, get all the matching substrings in the string, and return a list result. The parameter meaning of this function is as follows:

Findall (pattern, string, flags=0)

Pattern: specifies the regular expression that needs to match.

String: specifies the string to be processed.

Flags: specifies the matching pattern. The common values can be re.I, re.M, re.S, and re.X. Re.I 's pattern is to make regular expressions case-insensitive; re.M 's pattern is to let regular expressions be matched by multiple lines; and re.S 's pattern is to specify regular symbols. You can match any character, including newline characters; the re.X pattern allows regular expressions to be written in more detail, such as multiline representation, ignoring white space characters, adding comments, and so on.

Matching substitution of strings

The function of the sub function in the re module is to replace, similar to the replace method of a string, which replaces what satisfies the match with repl based on the regular expression. The parameter meaning of this function is as follows:

Sub (pattern, repl, string, count=0, flags=0)

Pattern: the same as pattern in the findall function.

Repl: specifies the new value to replace with.

String: the same as string in the findall function.

Count: used to specify the maximum number of replacements, which defaults to all substitutions.

Flags: the same as flags in the findall function.

Matching Segmentation of Strings

The split function in the re module separates strings according to a specified regular expression, similar to the split method for strings. The specific parameters of this function are as follows:

Split (pattern, string, maxsplit=0, flags=0)

Pattern: the same as pattern in the findall function.

Maxsplit: used to specify the maximum number of splits. The default is all splits.

String: the same as string in the findall function.

Flags: the same as flags in the findall function.

Actual combat case

If you have mastered the meaning of the above functions and parameters, you need to further strengthen your understanding through a case, and then give examples to illustrate the above three functions:

# Import re module for regular expressions import re # take out all the weather conditions in the string string8 string8 = "{ymd:'2018-01-01-01, aqiInfo:' mildly contaminated'}, {ymd:'2018-01-02 light rain, aqiInfo:' you'}, {ymd:'2018-01-03 light rain ~ moderate rain, aqiInfo:' you'}, {ymd:'2018-01-04' Tianqi:' Zhongyu ~ light rain', aqiInfo:' you'} "# use the findall function print (re.findall (" tianqi:' (. *?)', string8)) # take out all the words containing the O letter string9 = 'Together, we discovered that a free market only thrives when there are rules to ensure competition and fair play, Our celebration of initiative and enterprise' # based on the regular expression using the findall function print (re.findall (' wobbly owning) Flags = re.I) # remove punctuation, numbers and letters from string10 string10 ='it is known The four steam condensation tanks shipped this time belong to the second-class nuclear pressure equipment of the International Thermonuclear Experimental reactor (ITER) project, and have successively completed the pressure test, vacuum test, helium leak detection test, Jack test, lug load test, stack test and other acceptance tests.' # use the sub function print (re.sub ('[,) based on regular expressions. , a-zA-Z0-9 ()]',', string10) # split each sub-part of the string11 into string11 ='2 rooms and 2 rooms | 101.62 square meters | low area / 7 floors | facing south Shanghai Future-Pudong-Jin Yang-built in 2005'# based on regular expressions, use the split function split = re.split ('[- |]') String11) print (split) # cleaning split_strip of segmentation result split_strip = [i.strip () for i in split] print (split_strip) out: ['sunny', 'overcast ~ light rain', 'light rain ~ moderate rain', 'moderate rain ~ light rain'] ['Together',' discovered', 'only',' to', 'competition',' Our', 'celebration' 'of'] it is reported that the steam condensation tank shipped this time belongs to the nuclear secondary pressure equipment of the International Thermonuclear Experimental reactor project. It has successively completed the pressure test, vacuum test, helium leak detection test, Jack test, load test, overloading test and other acceptance tests [' 2 rooms, 2 rooms', '101.62 square meters', 'low area / 7 floors', 'facing south', 'Shanghai Future'. 'Pudong', 'Jin Yang', 'built in 2005'] ['2 rooms and 2 halls', '101.62 square meters', 'low area / seventh floor', 'facing south', 'Shanghai Future', 'Pudong', 'Jin Yang', 'built in 2005']

As the above result shows, in the first example, through the regular expression "tianqi:' (. *?)'" To achieve the acquisition of target data, if parentheses are not used, values such as "tianqi:' sunny'" and "tianqi:' overcast ~ light rain" will be generated, so the parentheses are added for grouping and only the contents of the group are returned.

In the second example, the regular expression is not written in parentheses, and the same result is returned if the parentheses are written, so findall is used to return list values that meet the matching criteria. If there are parentheses, only the matching values in parentheses are returned.

The third example uses the replacement method to replace all punctuation marks with empty characters, thus achieving the effect of deletion.

The fourth example is the segmentation of a string, if directly according to the regular'[,. , a-zA-Z0-9 ()]', the returned result contains an empty character, for example, there is a blank character after'2 rooms and 2 rooms'. In order to delete the leading and trailing empty characters of each element in the list, the list expression is used and the strip method of the string is combined to complete the compression of the empty characters.

After reading the above, have you mastered how to use regular expressions in Python? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report