How to reference the regular expression re package in python 07/13 Update SLTechnology News&Howtos

How to reference the regular expression re package in python

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the knowledge of "how to reference the regular expression re package in python". The editor shows you the process of operation through an actual case. The method of operation is simple and fast, and it is practical. I hope that this article "how to reference the regular expression re package in python" can help you solve the problem.

A simple reference to the regular expression re package

Regular expressions have always been ignored by me because they are rarely used in previous learning and development. Moreover, I felt very confused when I learned regular expressions before, so I resolutely gave up (QAQ), but I still had to pay it back when I came out. Recently, when doing log processing, I had to use regular expressions, which forced me to pick up regular expressions. Record some notes and cases of your own study here.

Import re packages in python

Import re I, re.match (pattern,string,flags=0)

Try to start with the string position (see clearly, start position! ) matches a pattern Success returns a match object, while failure returns none.

Parameter description:

Pattern: regular expression

String: string

Flags: optional flag bit

Note: the optional logo is briefly described below.

The method to get the object:

Use group (num) to get the content within the object group.

For example:

# _ * _ coding:utf8_*_import re str1='010-011-110'pattern = r'\ d {3} -\ d {3} 'match = re.match (pattern,str1) print match.group () print match.group (0) print match.group (1) print match.group (2) print match.group (3) # output is: 010-011-110010-011-110010011110

The most important thing about the match () method is that it starts with a string, and keep in mind that I've made a lot of mistakes on this point.

When writing simple regular expressions, we can group them with () so that we can take values in subsequent processing. We will also talk about taking values by naming the capture later.

2. Re.search (pattern,string,flags=0)

Like the match function argument, it is also used to match strings. The biggest difference is that it can be matched from anywhere in the string, unlike match, which is limited to the beginning of the string. The parameter is the same as match, so we don't have to explain it, but go straight to the example.

# unlike the match example, there are many spaces in front of str1=' 001-010-110 blank # like the pattern in match pattern = r'\ d {3} -\ d {3} -\ d {3}'# if you use the match () function at this time, the result must not match. Search = re.search (pattern,str1) print search.group () print search.group (0) print search.group (1) print search.group (2) print search.group (3) # result: 001-010-110001-010-110001010110

For match and search, again, note that one must match at the beginning of the string, and the other is anywhere.

Retrieve and replace re.sub ()

Used to replace matches in a string

Re.sub (pattern,repl,string,count,flags)

Parameter description:

Pattern: regular expression

Repl: substituted string, which can be a function

String: the original string to be found

Count: the number of times it is replaced. All matches are replaced by default.

Flags: flag bit

# _ * _ coding:utf-8_*_import re phone = "888-7777-6666 # awesome number # Delete the comment num = re.sub in the string, print num# delete the comment and-realphone = re.sub (r'\ Downline) print realphone# result is: 888-7777-666688877776666

The sub function is not difficult to understand, but it is mainly in the use of the repl parameter. Repl can be a function. For example:

Multiplies the number in a string by two

Def double (match): value = int (match.group ('value')) return str (value*2) s='APPLE23EFG567'print re.sub (r' (? P\ d +)', double,s) # result: APPLE46EFG1134

Because repl is a function, it will be replaced with the return value of the function when replaced.

Note: P is the named capture of the regular expression, which will be briefly recorded below

IV. Named capture of regular expressions

Format is:? P

It is often used when dealing with string values.

Example:

Num = '001-010-110'pattern = r' (\ d {3})-(\ d {3})-(\ d {3}) 'match = re.match (pattern,num) print match.group () # 001-010-110print match.group (1) # 001print match.group (2) # 010print match.group (3) # 110

In the above example, to get the value of each item separately, use group (num), and when the regular expression becomes complex, and then use num to take the value, it is likely to get the wrong value. So we propose to use named capture. Here is a simple example:

Pattern = r'(? P\ d {3})-(? P\ d {3})-(? P\ d {3}) 'match = re.match (patter, num) print match.group (' Area') # 001print match.group ('zhong') # 010print match/group (' wei') # 110

Although the use of named capture in the above example will reduce the readability of the regular expression, the complex rule of named capture will get the desired value exactly (of course, the rule must be written accurately)

Correct use posture of re Library

Premise assumption:

Have fully mastered PCRE-style regular expressions

Be familiar with re library documents

Why

I don't need to repeat the power of regular expressions, and Python's support for it is also very powerful, except that:

Re.search (pattern, string, flags=0) re.match (pattern, string, flags=0).

Can you use the series of module-level function shown above very quickly? if you use Python to make regular matching every day, I believe you must be very proficient. But if you need to skim through the document every time to know how to use it, think about whether API is somehow poorly designed (some languages most likely don't have pattern in the first place).

Generally speaking, the fewer interface parameters of API, the better. The best thing is that there are no parameters, the caller has no brain to call, and there is no memory burden. And Python's re library, in my opinion, should be at least a mix of "imperative" and "OOP" styles, and the interface is not "minimized and orthogonal".

Use posture

The correct pose would be to use only the OOP style and completely forget the series of module-level function provided by the re library (such as re.search, re.match, etc.).

First, the Regex object is constructed every time, then the Match object is obtained from the Regex object, and then a series of operations are performed on the Regex object and the Match object. For example:

# 1. Construct REGEX = re.compile ($pattern, flags) flags is the constant of the re module # 2. Get MatchObject m = regex.search (string) # 3. The subsequent use of MatchObject 1. Get packet group () 2. Groups 3. Groupdict () application example

For example, this is how I use my own PathUtils (I like all kinds of Utils very much):

From _ future__ import (absolute_import, unicode_literals) import re class PathUtils (object): "tool function for path operation"_ LINUX_ROOT ='/'_ LINUX_PATH_SPLITOR ='/'@ classmethod def is_two_linux_path_contains (cls, path2) Path3): "whether the two Linux paths contain each other" if path2 = = cls._LINUX_ROOT or path3 = = cls._LINUX_ROOT: return True path2_split = path2.split (cls._LINUX_PATH_SPLITOR) path3_split = path3.split (cls._LINUX_PATH_SPLITOR) for item1, item2 in zip (path2_split) Path3_split): if item1! = item2: return False return True @ classmethod def is_valid_linux_path (cls, path): if not path: return False LINUX_PATH_REGEX = r'^ (/ [^ /] *) + /? $'return cls.is_valid_pattern (path LINUX_PATH_REGEX) @ classmethod def is_valid_windows_path (cls, path): if not path: return False WINDOWS_PATH_REGEX = r'^ [a-zA-Z]:\\ ((?! [: "/\\ |? *]) + (?

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.