The usage of python regular expression 03/31 Update SLTechnology News&Howtos

The usage of python regular expression

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces the knowledge of "the use of python regular expressions". Many people will encounter this dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Simple mode

We will start with the simplest regular expression learning. Since regular expressions are often used for string manipulation, let's start with the most common task: character matching.

For a detailed explanation of the underlying computer science of regular expressions (deterministic and non-deterministic finite automata), you can refer to any textbook related to writing compilers.

Character matching

Most letters and characters generally match themselves. For example, the regular expression test exactly matches the string "test". (you can also use case-insensitive mode, which also makes the RE match "Test" or "TEST"; more on that later. )

There are exceptions to this rule, of course; some characters are special, and they don't match themselves, but indicate that they should match something special, or that they affect the number of repeats in other parts of RE. This article is devoted to the discussion of various metacharacters and their functions.

Here is a complete list of metacharacters; their meaning is discussed in the rest of this guide.

. ^ $* +? {[]\ | ()

The first metacharacters we examined are "[" and "]". They are often used to specify a character category, which is the character set you want to match. Characters can be listed individually, or a character interval can be represented by two given characters separated by the "-" sign. For example, [abc] will match any character in "a", "b", or "c"; you can also use the interval [Amurc] to represent the same character set, which is the same as the former. If you only want to match lowercase letters, then RE should be written as [a Murz].

1) Metacharacters do not work in categories. For example: [akm$] will match any of the characters "a", "k", "m", or "$"; "$" is usually used as a metacharacter, but in the character category, its properties are removed and restored to normal characters.

2) complement to match characters that are not within the interval. This is done by using "^" as the first character of the category; "^" elsewhere simply matches the "^" character itself. For example, [^ 5] will match any character except "5".

3) metacharacter backslash "\". As a string letter in Python, different characters can be added after the backslash to indicate different special meanings. It can also be used to cancel all metacharacters so that you can match them in the pattern. For example, if you need to match the characters "[" or "\", you can remove their special meaning with a backslash before them:\ [or\\.

Some predefined character sets represented by special characters starting with "\" are usually useful, such as number sets, alphabet sets, or other non-empty character sets. The following are the available preset special characters:

\ d matches any decimal number; it is equivalent to class [0-9]. \ D matches any non-numeric character; it is equivalent to class [^ 0-9]. \ s matches any white space character; it is equivalent to the class [\ t\ n\ r\ f\ v]. \ s matches any non-white space character; it is equivalent to the class [^\ t\ n\ r\ f\ v]. \ w matches any alphanumeric character; it is equivalent to the class [a-zA-Z0-9 _]. \ W matches any non-alphanumeric character; it is equivalent to the class [^ a-zA-Z0-9 _].

In this way, special characters can be included in a character class. For example, [\ s.] The character class will match any white space character or "," or ".".

4) the metacharacter is. . It matches any character except newline characters, and it can even match newline characters in alternate mode (re.DOTALL). It is usually used where you want to match "any character".

Repetition

The first thing a regular expression can do is to be able to match a character set of variable length, which is something that other methods that can act on strings cannot. However, if that is the only additional function of regular expressions, then they are not so good. Another function of them is that you can specify the number of times to repeat part of a regular expression.

The metacharacter of the first repetition function we discuss is *. * does not match the alphabetic character "*"; instead, it specifies that the previous character can be matched zero or more times, not just once.

For example, ca*t will match "ct" (0 "a" characters), "cat" (1 "a"), "caaat" (3 "a" characters), and so on. The RE engine has internal limits on the size of various integer types from C to prevent it from matching more than 2 billion "a" characters; you may not have enough memory to build such a large string, so it will not accumulate to that limit.

Repeating like * is "greedy"; when repeating a RE, the matching engine tries to repeat as many times as possible. If the later part of the pattern is not matched, the matching engine will fall back and try a smaller repeat again.

Step-by-step examples can make it clearer. Let's consider the expression a [bcd] * b. It matches the letter "a", zero or more letters from the class [bcd], and ends with "b". Now think about how the RE matches the string "abcbd".

StepMatchedExplanation1aa matching pattern 2abcbd engine matches [bcd] * and tries to match to the end of the string 3Failure engine tries to match b, but the current position is already the last of the character, so the failed 4abcb returns, and [bcd] * tries to match one less character. 5Failure tries b again, but at present the last character is "d". 6abc returns again, and [bcd] * only matches "bc". 7abcb tries b again, and this time the character on the current bit happens to be "b"

The end of the RE can now be reached, which matches "abcb". This proves that the matching engine will do everything it can to match at first, and then step back and try the rest of the RE over and over again if there is no match. Until it returns an attempt to match [bcd] to zero times, if it then fails, the engine will assume that the string does not match the RE at all.

Another repeating metacharacter is +, which indicates a match of one or more times. Note the difference between * and +; * match zero or more times, so it can not appear at all, while + requires at least one occurrence. Using the same example, ca+t can match "cat" (1 "a") and "caaat" (3 "a"), but not "ct".

There are more qualifiers. Question mark? Match once or zero times; you can think of it as optional to identify something. For example: home-?brew matches "homebrew" or "home-brew".

The most complex repetition qualifier is {mdirection n}, where m and n are decimal integers. The qualifier means at least m repeats and up to n repeats. For example, a / {1jue 3} b will match "a _ peg _ b", "a _ max _ r _ b" and "a _ mag _ r _ b". It cannot match "ab" because it does not have a slash, nor can it match "a Unix Universe" because there are four.

You can ignore m or n; a reasonable value is assumed for the missing value. Ignoring m will assume that the lower boundary is 0, while ignoring n will result in the upper boundary being infinity-- actually the 2 billion we mentioned earlier, but this may be the same as infinity.

Careful readers may notice that the other three qualifiers can be expressed in this way. {0,} equals *, {1,} equals +, and {0p1} equals? The same. It is best to use *, +, or? if you can. It's simple because they're shorter and easier to understand.

Use regular expressions

Now that we've seen some simple regular expressions, how do we actually use them in Python? The re module provides an interface to the regular expression engine that allows you to compile REs into objects and use them for matching.

Compile regular expressions

Regular expressions are compiled into an instance of `RegexObject`, which provides methods for different operations, such as pattern matching search or string replacement.

#! python > import re > p = re.compile ('ab*') > print p

Re.compile () also accepts optional flag parameters, which are often used to implement different special functions and syntax changes. We'll look at all the available settings later, but for now just one example:

#! python > p = re.compile ('ab*', re.IGNORECASE)

RE is sent to re.compile () as a string. REs is treated as a string because regular expressions are not a core part of the Python language and no specific syntax is created for it. Applications don't need REs at all, so there's no need to include them to make the language specification bloated. ) while the re module is included by Python only as a C extension module, just like the socket or zlib module

Use REs as a string to keep the Python language concise, but one of the troubles is as described in the title of the next section.

The trouble of backslash

In the early days, regular expressions used the backslash character ("\") to indicate a special format or a special usage that allowed the use of a special character without calling it. This conflicts with the same characters that Python plays the same role in the string.

Let's illustrate that you want to write a RE to match the string "\ section", which may be looked up in a LATEX file. In order to judge in the program code, you must first write the string you want to match. Next you need to add a backslash to all backslashes and other metacharacters to remove their special meaning, and the string to match is "\\ section". When passing this string to re.compile (), it must still be "\\ section". However, if expressed as the string string literals of Python, the two backslashes in "\ section" have to cancel the special meaning again, and the final result will be "\ section".

Character stage\ section string to match\\ section cancels the special meaning of the backslash for re.compile the string string literals of "\\ section" is "\\ section" the special meaning of canceling the backslash

Simply put, in order to match a backslash, you have to write'\'in the RE string, because it must be "\\" in the regular expression, and each backslash must be represented as "\" in the regular Python string real value. This repetitive feature of backslashes in REs results in a large number of repetitive backslashes, and the resulting strings are difficult to understand.

The solution is to use the raw string representation of Python for regular expressions; adding an "r" backslash in front of the string will not be handled in any special way, so r "\ n" is two characters containing "\" and "n", while "\ n" is a character, indicating a line break. Regular expressions are usually represented by this raw string in Python code.

Regular string Raw string "ab*" r "ab*"\ section" r "\\ section"\\ w+\\ s +\ 1" r "\ w+\ s +\ 1"

Perform matching

Once you have a compiled regular expression object, what are you going to do with it? The `RegexObject` instance has some methods and properties. Only the most important ones are shown here. For a complete list, please refer to Python Library Reference

The method / attribute functions match () to determine whether the RE matches the search () scan string at the beginning of the string, finds the location where the RE matches, finds all the substrings that RE matches, returns them as a list to finditer (), finds all the substrings that RE matches, and returns them as an iterator

If there is no match, match () and search () return None. If successful, an instance of `MatchObject` is returned with information about where it starts and ends, the substring it matches, and so on.

You can learn it by using man-machine dialogue and experimenting with re modules. If you have Tkinter, you might want to consider referring to Tools/scripts/redemo.py, a demonstration program included in the Python distribution.

First, run the Python interpreter, import the re module, and compile a RE:

#! pythonPython 2.2.2 (# 1, Feb 10 2003, 12:57:01) > import re > p = re.compile ('[Amurz] +') > p

Now, you can try to match different strings with RE's [a Murz] +. An empty string will not match at all, because + means "one or more repeats". In this case, match () will return None because it leaves the interpreter with no output. You can clearly print out the results of match () to make this clear.

#! python > > p.match ("") > > print p.match ("") None

Now, let's try to use it to match a string, such as "tempo". At this point, match () will return a MatchObject. So you can save the results in variables for later use.

#! python > m = p.match ('tempo') > print m

Now you can query `MatchObject` for information about matching strings. The MatchObject instance also has several methods and properties; the most important ones are as follows:

The method / attribute functions group () to return the string matched by RE start () returns the position where the match begins end () returns the position of the end of the match span () returns the position of a tuple containing the match (start, end)

Try these methods and you'll soon know what they do:

#! python > > m.group () 'tempo' > m.start (), m.end () (0,5) > m.span () (0,5)

Group () returns the substring of the RE match. Start () and end () return the index at the beginning and end of the match. Span () returns the start and end indexes together with a single tuple. Because the matching method checks that if RE starts to match at the beginning of the string, then start () will always be zero. However, if the search method of the `RegexObject` instance scans the following string, in this case, the starting position of the match may not be zero.

#! python > > print p.match (':: message') None > m = p.search (':: message'); print m > > m.group () 'message' > m.span () (4,11)

In the actual program, the most common practice is to save `MatchObject` in a variable, and then check whether it is None, usually as follows:

#! pythonp = re.compile (...) m = p.match ('string goes here') if m:print' Match found:', m.group () else:print'No match'

The two `RegexObject` methods return substrings of all matching patterns. Findall () returns a table of matching strings:

#! python > p = re.compile ('\ pipers piping') > p.findall ('12 drummers drumming, 11 pipers piping, 10 lords Amuring') ['12 drummers drumming,'11 pipers piping,'10']

Findall () has to create a list when it returns a result. In Python 2.2, you can also use the finditer () method.

#! python > iterator = p.finditer ('12 drummers drumming, 11... 10...) > iterator > for match in iterator:... Print match.span ()... (0,2) (22,24) (29,31) the logo means DOTALL, S makes. Match all characters, including newline IGNORECASE, I make the match case-insensitive LOCALE, L do localized recognition (locale-aware) match MULTILINE, M multiline matching, affect ^ and $VERBOSE, X can use the verbose state of REs to make it clearer and easier to understand

I (IGNORECASE)

Makes matches case-insensitive; character classes and strings ignore case when matching letters. For example, [Amurz] can also match lowercase letters, and Spam can match "Spam", "spam", or "spAM". This lowercase letter does not take into account the current position.

L (LOCALE)

Affects\ w,\ W,\ b, and\ B, depending on the current localization settings.

Locales is a feature in the C language library that is used to help with programming that needs to be considered in different languages. For example, if you are working with French text, you want to use\ w + to match the text, but\ w only matches the character class [A-Za-z]; it does not match "é" or "?". If your system is properly configured and localized to French, the internal C function will tell the program that "é" should also be considered a letter. Using the LOCALE flag when compiling regular expressions will result in using these C functions to process compiled objects after\ w; this will be slower, but you will also be able to match French text with\ w + as you would like.

M (MULTILINE)

(^ and $will not be explained at this time; they will be introduced in Section 4.1.)

Using "^" matches only the beginning of the string, while $matches only the end of the string and the end of the string directly before the line break (if any). When this flag is specified, "^" matches the beginning of the string and the beginning of each line in the string. Similarly, the $metacharacter matches the end of the string and the end of each line in the string (directly before each newline).

S (DOTALL)

Make "." Special characters exactly match any character, including line breaks; without this flag, "." Matches any character except newline.

X (VERBOSE)

This flag allows you to write regular expressions more easily by giving you a more flexible format. When this flag is specified, the white space in the RE string is ignored unless it is in the character class or after the backslash; this allows you to organize and indent the RE more clearly. It also allows you to write comments to RE, which are ignored by the engine; comments are identified by a "#" sign, but the symbol cannot be followed by a string or backslash.

For example, here is a RE; using re.VERBOSE to see how easy it is to read it.

#! pythoncharref = re.compile (r "" & [[]] # Start of a numeric entity reference | here has wrong.i can't fix ([0-9] + [^ 0-9] # Decimal form | 0 [0-7] + [^ 0-7] # Octal form | x [0-9a-fA-F] + [^ 0-9a-fA-F] # Hexadecimal form) "", re.VERBOSE)

Without the verbose setting, RE would look like this:

#! pythoncharref = re.compile ("& # ([0-9] + [^ 0-9]"| 0 [0-7] + [^ 0-7]"| x [0-9a-fA-F] + [^ 0-9a-fA-F])"))

In the above example, Python's string automatic concatenation can be used to break RE into smaller parts, but it is more difficult to understand than when using the re.VERBOSE flag

More mode features

So far, we have shown only part of the functionality of regular expressions. In this section, we will show some new metacharacters and how to use groups to retrieve matched text parts.

There are also some metacharacters that we haven't shown yet, most of which will be shown in this section.

The remaining metacharacters to be discussed are the zero-width delimiter (zero-width assertions). They don't make the engine process strings faster; on the contrary, they don't correspond to any characters at all, just simple successes or failures. For example,\ b is an assertions that locates the current position at the word boundary, which will not be changed by\ b at all. This means that the zero-width delimiters (zero-width assertions) will never be repeated, because if they match once at a given location, they can obviously be matched countless times.

| |

Optional, or "or" operator. If An and B are regular expressions, A | B will match any string that matches "A" or "B". | the priority is very low so that you can run properly when you have multiple strings to select. Crow | Servo will match "Crow" or "Servo" instead of "Cro", a "w" or an "S", and "ervo".

To match the letter "|", you can use\ |, or include it in a character class, such as [|].

Matches the beginning of the line. Unless the MULTILINE flag is set, it is only the beginning of the matching string. In MULTILINE mode, it can also directly match each newline in a string.

For example, if you only want to match the word "From" at the beginning of the line, then RE will use ^ From.

#! python > > print re.search ('^ From', 'From Here to Eternity') > print re.search (' ^ From', 'Reciting From Memory') None

Matches the end of a line, which is defined as either the end of a string or any position after a newline character.

#! python > print re.search ('} $','{block}') > print re.search ('} $','{block}') None > print re.search ('} $','{block}\ n')

Match a "$", use\ $or include it in a character class, such as [$].

\ a

Matches only the beginning of the string. When not in MULTILINE mode,\ An and ^ are actually the same. However, they are different in MULTILINE mode;\ An only matches the beginning of the string, while ^ can also match anywhere in the string after the newline character.

\ Z

Matches only at the end of the string.

Matches only the end of the string.

\ b

Word boundaries. This is a zero-width delimiter (zero-width assertions) that is only used to match the beginning and end of a word. A word is defined as an alphanumeric sequence, so the endings are marked with blank or non-alphanumeric characters.

The following example matches only the entire word "class"; it does not match when it is included in other words.

#! python > p = re.compile (r'\ bclass\ b') > > print p.search ('no class at all') > print p.search ('the declassified algorithm') None > print p.search (' one subclass is') None

When using this particular sequence, you should keep in mind that there are two subtleties. The first is the most common conflict between Python strings and regular expressions. In Python strings, "\ b" is a backslash character and the ASCII value is 8. If you do not use the raw string, then Python will convert "\ b" into a Backoff character, and your RE will not match it as you want it to. The following example looks the same as our previous RE, but with an "r" missing before the RE string.

#! python > p = re.compile ('\ bclass\ b') > print p.search ('no class at all') None > print p.search ('\ b' + 'class' +'\ b')

Second, in the character class, this qualifier (assertion) does not work, and\ b represents a fallback character to be compatible with the Python string.

\ B

Another zero-width delimiter (zero-width assertions), which is the exact opposite of\ b, matches only when the current position is not at the word boundary. For example:

#! python > p = re.compile (r'\ Bclass\ B') > print p.search ('the declassified algorithm') > p = re.compile (r'\ bclass\ b') > print p.search ('the declassified algorithm') None

Grouping

You often need to get more information than whether the RE matches. Regular expressions are often used to parse strings, write a RE to match the parts of interest, and divide it into groups. For example, the header of a RFC-822 is separated by a header name and a value, which can be handled by writing a regular expression to match the entire header with one set of matching header names and another set of matching header values.

Groups are identified by the "(" and ")" metacharacters. "(" and ")" have many of the same meanings in mathematical expressions; together they group the expressions in them. For example, you can repeat the contents of a group with repeating qualifiers such as *, +,?, and {ab n}. For example, (ab) * will match zero or more repeating "ab".

#! python > p = re.compile ('(ab) *') > print p.match ('ababababab'). Span () (0,10) > print p.match (' bababababab'). Span () (0,0) > p = re.compile ('b (ab) *') > > print p.match ('bababababab'). Span () (0,11)

Groups are specified with "(" and ")" and get indexes at the beginning and end of their matching text; this can be retrieved through a parameter using group (), start (), end (), and span (). Groups are counted from 0. Group 0 always exists; it is the entire RE, so the methods of `MatchObject` all take group 0 as their default parameter. We'll see later how to express span that can't get the text they match.

#! python > p = re.compile ('(a) b') > m = p.match ('ab') > m.group ()' ab' > m.group (0) 'ab'

The group counts from left to right, starting with 1. Groups can be nested. The number of counts can be determined by calculating the number of parentheses open from left to right.

#! python > p = re.compile ('(a (b) c) d') > > m = p.match ('abcd') > m.group (0)' abcd' > m.group (1) 'abc' > m.group (2)' b'

Group () can enter more than one group number at a time, in which case it returns a tuple containing the values corresponding to those groups.

#! python > > m.group (2jin1) ('baked,' abc', 'b')

The The groups () method returns a tuple containing all the team strings, from 1 to the contained group number.

#! python > > m.groups () ('abc', 'b')

The reverse reference in the pattern allows you to specify the contents of the previously captured group, which must also be found at the current position of the string. For example, if the contents of group 1 can be found in the current location,\ 1 succeeds or fails. Remember that Python strings also use backslashes plus data to allow arbitrary characters in the string, so be sure to use raw strings when using backreferences in RE.

For example, the following RE finds pairs of words in a string.

#! python > p = re.compile (r'(\ b\ w+)\ s +\ 1') > p.search ('Paris in the the spring'). Group ()' the the'

It's not common to just search for reverse references to a string like this-- text formats that repeat data in this way are rare-- but you'll soon find them useful for string substitution.

No capture group and named group

A well-designed REs may use a number of groups to capture substrings of interest and to group and structure the RE itself. In complex REs, it becomes difficult to track group numbers. There are two features that can help with this problem. They also use the general syntax of regular expression extensions, so let's take a look at the first one.

Perl 5 adds several additional features to standard regular expressions, most of which are also supported by Python's re module. It is difficult to choose a new single-button metacharacter or a special sequence that starts with "\" to represent the new function without confusing Perl regular expressions with standard regular expressions. If you choose "&" as the new metacharacter, for example, the old expression thinks that "&" is a normal character and will not escape when using\ & or [&].

The solution for Perl developers is to use (?) As an extension grammar. "?" Following the parentheses will directly lead to a syntax error because of "?" There are no characters to repeat, so it does not cause any compatibility problems. Follow "?" The following characters indicate the purpose of the extension, so (? = foo)

Python has added an extension syntax to the Perl extension syntax. If the first character after the question mark is "P", you will know that it is an extension to Python. There are currently two such extensions: (? P.) Define a named group, and (? P=name) is a reverse reference to a named group. If future versions of Perl 5 add the same functionality with different syntax, the re module will also change to support the new syntax while continuing to maintain the Python-specific syntax for compatibility purposes.

Now let's take a look at the normal extension syntax, and let's go back to simplifying the use of group-running features in complex REs. Because groups are numbered from left to right, and a complex expression may use many groups, it can make it difficult to track the current group number, and it is troublesome to modify such a complex RE. Insert a new group at the beginning, and you can change each group number after it.

First of all, sometimes you want to use a group to collect part of the regular expression, but are not interested in the contents of the group. You can use a no-capture group: (?) To do this so that you can send any other regular expression in parentheses.

#! python > m = re.match ("([abc]) +", "abc") > m.groups () ('abc,) > > m = re.match ("(?: [abc]) +", "abc") > m.groups () ()

Except for capturing the contents of a matching group, a no-capture group behaves exactly the same as a capture group; you can place any character in it, repeat it with a repeating meta character such as "*", and nest it in other groups (no capture group and capture group). (?) It is especially useful for modifying existing groups because you can add a new group without changing all other group numbers. There is no difference in search efficiency between the capture group and the no-capture group, and no one is faster than the other.

Second, what is more important and powerful is naming a group; unlike specifying a group with a number, it can be specified by name.

The syntax of the command group is one of the Python-specific extensions: (? P...). The name is obviously the name of the group, and in addition to the name of the group, the named group is the same as the capture group. The method of `MatchObject` accepts either an integer representing the group number or a string containing the group name when processing the capture group. Named groups can also be numbers, so you can get information about a group in two ways:

#! python > p = re.compile (r'(? P\ b\ w +\ b)) > m = p.search (Lots of punctuation)') > m.group ('word')' Lots' > m.group (1) 'Lots'

Naming groups is easy to use because it allows you to use names that are easy to remember instead of numbers that you have to remember. Here is an example of RE from the imaplib module:

#! pythonInternalDate = re.compile (r'INTERNALDATE "'r' (? P [123Z] [0-9])-(? P [Amurz] [Amiz]]) -'r' (? P [0-9] [0-9] [0-9])'r' (? P [0-9] [0-9]): (? P [0-9] [0-9]): (? P [0-9] [0-9]) ] [0-9])'r'(? P [- +]) (? P [0-9] [0-9]) (? P [0-9] [0-9])'r' ")

Obviously, it's much easier to get m.group ('zonem') than to remember to get group 9.

Because of the syntax of reverse reference, expressions like (.)\ 1 represent the group number, so using the group name instead of the group number will naturally make a difference. There is also an Python extension: (? P=name), which enables group content called name to be found in the current location again. Regular expressions can also be written as (? P\ b\ w +)\ s + (? P=word) in order to find duplicate words.

#! python > p = re.compile (r'(? P\ b\ w +)\ s + (? P=word)') > p.search ('Paris in the the spring'). Group ()' the the'

The method / attribute acts as split () to slice the string where RE matches and generate a list. Sub () finds all the substrings that RE matches and replaces subn () with a different string the same as sub (), but returns a new string and the number of substitutions

Examples of syntax instructions. Match any character except the newline character b.c match bac,bdc* matches the previous character 0 or more times Benzc matches c, or bbbc+ matches the previous character 1 or more times Benzc matches bc or bbbc? Match previous character 0 or 1 Benzc match c or bc {m} match previous character m times b {2} c match bbc {mforce n} match previous character m to n times b {2 abc 5} c match bbc or bbbbc [abc] match any character [bc] match b or c\ d match number [0-9] b\ dc match b1c, etc.\ D match non-numeric Equivalent to [^\ d] b\ Dc matching bAc\ s matching blank characters b\ sc matching b c\ S matching non-white space characters [\ ^ s] b\ Sc matching bac\ w matching [A-Za-z0-9 _] b\ wc matching bAc, etc.\ W is equivalent to [^\ w] b\ Wc matching b c\ escape characters B\\ c match the beginning of b\ c ^ matching string ^ the end of the bc matching sentence matches the end of the string bc$ matches the string ending with bc\ An only matches the beginning of the string\ Abc matches the beginning of the string bc\ Z only matches the end of the string bc\ Z matches the bc at the end of the string | matches any b | c matches b or cre module example 2.1. Start using re

Python provides support for regular expressions through the re module. The general step in using re is to first compile the string form of the regular expression into a Pattern instance, then use the Pattern instance to process the text and get the matching result (a Match instance), and finally use the Match instance to get the information and do something else.

Import re p = re.compile (r'(\ w +) (\ w +) (\ w +)') s ='I say, hello worldview' Print p.subn (r'\ 2\ 1expressions, s) def func (m): return m.group (1). Title () +'+ m.group (2). Title () print p.subn (func, s) # # output # ('say I, world hellographies, 2) # (' I Say, Hello regular expressions, 2) this is the end of the introduction to "the use of python regular expressions". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.