Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What regular expression functions are commonly used in python

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

In this issue, the editor will bring you about what regular expression functions are commonly used in python. The article is rich in content and analyzes and describes for you from a professional point of view. I hope you can get something after reading this article.

01 Re Overview

The Re module is a built-in module for python that provides all the uses of regular expressions in python, and the default installation location is in the Lib folder under the python root directory (such as..\ Python\ Python37\ Lib). It mainly provides three types of string manipulation methods:

Character search / match

Character substitution

Character segmentation

Since it is a string-oriented module, the string encoding type has to be mentioned. In the re module, the pattern string and search string can be either Unicode strings (common str types) or 8-bit byte strings (bytes,2 bit hexadecimal numbers, for example,\ xe5), but they must be of the same type.

02 string search / match

Precompiled: compile

Before introducing the find and match function, you need to know the compile function of re, which compiles a pattern string into a regular expression type for subsequent fast matching and reuse

Import repattern = re.compile (r'[amurz] {2mer5}') type (pattern) # re.Pattern

This example creates a regular expression object (re.pattern), named pattern, to match a pattern string of 2-5 lowercase letters. Later, when you use other regular expression functions, you can use pattern to make method calls.

Match: match

The match function is used to match from the starting position of the text string. If the match is successful, the corresponding matching object is returned. In this case, the group () method can be called to return the matching result, or the span () method can be used to return the starting and ending subscript interval of the match. Otherwise, None can be returned.

Import repattern = re.compile (r'[a murz] {2mer5}') text1 = 'this is a re test'res = pattern.match (text1) print (res) # if res: print (res.group ()) # this print (res.span ()) # (0,4) text2 =' Yes, this is a re test'print (pattern.match (text2)) # None

The match function also has a morph function fullmatch, which returns a matching object if and only if the pattern string and the text string all match, otherwise it returns None.

Search: search

Match only provides the result of matching from the starting position of the text string. If you want to match from any position, you can call the search method, which is similar to the match method. When the match is successful, a matching object is immediately returned. You can also call the span () method to get the start-stop interval and the group method to get the matching text string.

Import repattern = re.compile (r'\ s [a-z] {2}') text1 = 'this is a re test'res = pattern.search (text1) print (res) # if res: print (res.group ()) # is print (res.span ()) # (4,7) pattern2 = re.compile (r'\ s [a-z] {5}') text2 = 'Yes, this is a re test'print (pattern2.search (text2)) # None

Both match and search are used to match a single result, the only difference is that the former matches from the starting position, while the latter matches from any location, and a match object is returned if the match succeeds.

Full search: findall/finditer

It is almost the most commonly used regular expression function, which is used to find all matching results. For example, in crawler information extraction, all matching fields can be easily extracted.

Import repattern = re.compile (r'\ s [a-z] {2helm 5}') text1 = 'this is a re test'res = pattern.findall (text1) print (res) # [' is', 're',' test']

Findall returns a list object type, or an empty list when there is no matching object. To avoid taking up too much memory by returning a large number of matching results at the same time, you can call the finditer function to return an iterator type, where each iterative element is a match object, and you can continue to call the group and span methods to get the corresponding results.

Import repattern = re.compile (r'\ s [a-z] {2helm 5}') text1 = 'this is a re test'res = pattern.finditer (text1) for rin res: print (r.group ()) "is re test"

When the matching pattern string is relatively simple or only needs to be called by words, all of the above methods can also directly call re class functions without prior compilation. At this point, the first parameter of each method is the pattern string.

Import repattern = re.compile (r'\ d {2Magne5}') text = 'this is re test're.findall (' [a murz] +', text) # ['this',' is', 're',' test'] 03 string replacement / segmentation

Replace: sub/subn

When you need to conditionally replace a text string, you can call the re.sub implementation (of course, you can compile first and then call the instance method). The corresponding parameters are pattern string, replacement format, and text string, respectively. You can also limit the number of replacements and matching patterns by increasing the default parameters. By grouping in pattern strings, you can implement formatted substitution of strings (similar to the format method of strings) to achieve specific tasks.

Import retext = 'today is 2020-03-05'print (re.sub (' -','', text)) # 'today is 20200305'print (re.sub (' -','', text, 1)) # 'today is 202003-05'print (re.sub (' (\ d {4})-(\ d {2})-(\ d {2})', r'\ 2 /\ 3 /\ 1cards, text)) # 'today is 03 lead 052020'

One transformation method of re.sub is re.subn. The difference is that it returns a tuple of 2 elements, where the first element is the result of replacement and the second is the number of replacements.

Import retext = 'today is 2020-03-05'print (re.subn (' -','', text)) # ('today is 20200305, 2)

Split: split

You can also call a regular expression to implement a specific segmentation of a string, which is equivalent to an enhanced version of the .split () method to achieve the segmentation of a specific pattern and return a list of cut results.

Import retext = 'today is a re test, what do you mind?'print (re.split (',', text)) # ['today is a re test',' what do you mind?'] 04 Summary

The re module in python provides common methods for regular expressions, each of which includes two forms of class method calls (such as re.match) or instance calls of pattern strings (pattern.match)

Common matching function: match/fullmatch

Commonly used search function: search/findall/finditer

Commonly used replacement function: sub/subn

Commonly used cutting function: split

There are many other methods, but they are not very common. Please refer to the official documentation.

In addition, python also has a third-party regular expression library regex to choose from.

These are the regular expression functions that are commonly used in the python shared by Xiaobian. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report