What are the basic knowledge points of Python regular expression 04/28 Update SLTechnology News&Howtos

What are the basic knowledge points of Python regular expression

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly shows you "what are the basic knowledge points of Python regular expression", which is easy to understand and well-organized. I hope it can help you solve your doubts. Let me lead you to study and learn "what are the basic knowledge points of Python regular expression".

Regular expressions are a powerful tool for working with strings, and they are not part of Python.

There is also the concept of regular expressions in other programming languages, but the only difference is that different programming languages implement different amounts of syntax.

It has its own unique syntax and an independent processing engine, and the syntax of regular expressions is the same in languages that provide regular expressions.

The following figure shows the process of matching using regular expressions:

1.1 introduction

Regular expressions are not part of Python. Regular expressions are powerful tools for dealing with strings, with their own unique syntax and an independent processing engine, which may not be as efficient as the methods that come with str, but they are very powerful. Thanks to this, the syntax of regular expressions is the same in languages that provide regular expressions, except that the number of grammars supported by different programming language implementations is different; but don't worry, unsupported syntax is usually an uncommonly used part.

A regular expression is a special sequence of characters that helps you easily check whether a string matches a pattern. Python has added the re module since version 1.5, which provides Perl-style regular expression patterns. The re module enables the Python language to have all the regular expression functions.

1.2 various uses to know

Pattern strings use special syntax to represent a regular expression:

Letters and numbers represent themselves. The letters and numbers in a regular expression pattern match the same string. Most letters and numbers have different meanings when preceded by a backslash. Punctuation marks match themselves only when they are escaped, otherwise they represent special meanings. The backslash itself needs to be escaped using a backslash.

Since regular expressions usually contain backslashes, you'd better use raw strings to represent them. The pattern elements (for example, ringing _ impulse tasking, which is equivalent to'/ / t') match the corresponding special characters.

The following table lists the special elements in the regular expression pattern syntax. If you provide optional flag parameters while using the pattern, the meaning of some pattern elements will change.

Of course, there are many of these uses, and we will give you the ones that are often used later. Try more and you will understand it.

Pattern

The pattern description ^ matches the beginning of the string $matches the end of the string. . Matches any character, except for newline characters, and when the re.DOTALL tag is specified, it can match any character including newline characters. [...] Used to represent a set of characters, listed separately: [amk] matches' axiomagery'or'k' [^.] Characters that are not in []: [^ abc] matches characters other than a _ c Re* matches 0 or more expressions. Re+ matches one or more expressions. Re? Match 0 or 1 fragment defined by the previous regular expression, non-greedy re {n}

Re {n,} exactly matches the n preceding expressions. Re {n, m} matches n to m fragments defined by the previous regular expression, greedily a | b matches an or b (re) G matches the expression in parentheses, which also means that a group of (? imx) regular expressions contain three optional flags: I, m, or x. Only the areas in parentheses are affected. The (?-imx) regular expression turns off the I, m, or x optional flag. Only the areas in parentheses are affected. (?: re) is similar to (...), but does not mean a group (? imx: re) uses I, m, or x optional flags (?-imx: re) in parentheses without using I, m, or x optional flags (? #...) Note. (? = re) forward positive delimiter. If there is a regular expression to. Indicates that if the current position is successfully matched, it succeeds, otherwise it fails. But once the included expression has been tried, the matching engine does not improve at all; the rest of the pattern also tries to the right of the delimiter. ?! Re) forward negative delimiter. Contrary to the affirmative delimiter; an independent pattern that successfully matches (? > re) when the contained expression does not match at the current position of the string, omitting backtracking. \ W match alphanumeric\ W match non-alphanumeric\ s match any blank character, which is equivalent to [\ t\ n\ r\ f].\ s matches any non-empty character\ d matches any number, which is equivalent to [0-9].\ D matches any non-numeric\ A matching string begins\ Z matching string, and if there is a line break, it only matches to the end string before the line break. C\ z match string end\ G match the position where the last match was completed. \ b matches a word boundary, that is, the position between the word and the space. For example,'er\ b 'can match' er','in 'never' but not 'er'' in 'verb'. \ B matches non-word boundaries.' Er\ B' can match 'er',' in 'verb' but not 'er'' in 'never'. \ n,\ t, etc. Matches a newline character. Matches a tab. Wait for\ 1.\ 9 matches the subexpression of the nth group. \ 10 matches the subexpression of the nth grouping if it is matched. Otherwise, it refers to the expression of the octal character code.

Character class

Instance description [Pp] ython matches "Python" or "python" rub [ye] matches "ruby" or "rube" [aeiou] matches any letter [0-9] in parentheses matches any number. Similar to [0123456789] [Amurz] match any lowercase letter [Amurz] match any uppercase letter [a-zA-Z0-9] match any letter and number [^ aeiou] all characters except aeiou letters [^ 0-9] match characters except numbers

Special character class

Example description. Matches any single character except "\ n". To match any character, including'\ n', use a pattern like'[.\ n]'. \ d matches a numeric character. Equivalent to [0-9]. \ D matches a non-numeric character. Equivalent to [^ 0-9]. \ s matches any white space characters, including spaces, tabs, page breaks, and so on. Equivalent to [\ f\ n\ r\ t\ v]. \ s matches any non-white space character. Equivalent to [^\ f\ n\ r\ t\ v]. \ w matches any word characters that include underscores. Equivalent to'[A-Za-z0-9]'. \ W matches any non-word characters. Equivalent to'[^ A-Za-z0-9]'.

1.3re.match function

Re.match attempts to match a pattern from the beginning of the string, and match () returns none if the match is not successful.

Re.match (pattern, string, flags = 0)

Pattern regular expression

String matched by string

Flags flag bit, which is used to control the matching mode, which will be discussed below

Go directly to the program:

Import string,rer = "abc" # regular expression if re.match (r, "abc"): # matches print 'done' else:print' defeat'

Results:

Done

You can practice more according to the usage given in the above tables:

Import string,rer = "A.C" # regular expression. Matches any character, except for newline characters, and when the re.DOTALL tag is specified, it can match any character including newline characters. If re.match (r, "abc"): print re.match (r, "abc") print 'done' else:print' defeat'

Results:

Done

Note that instead of showing a string that matches successfully, re.match () returns an object and none.

We can get the matching expression through the group (num) or groups () match object function.

The match object method describes the string of the entire expression that group (num=0) matches, and group () can enter more than one group number at a time, in which case it returns a tuple containing the values corresponding to those groups. Groups () returns a tuple containing all the team strings, from 1 to the contained group number.

Program:

Import string,re r = "A.C" if re.match (r, "abc"): line = re.match (r, "abc") print line.group () else: print 'defeat'

Results:

Abc

1.3re.search function

Re.search () scans the entire string and returns the first successful match

Re.search (pattern, string, flags=0)

Pattern regular expression

String matched by string

Flags flag bit, which is used to control the matching mode

As with re.match (), the re.search method returns a matching object after a successful match, otherwise it returns None.

Go directly to the program:

Import string,rer = "abc" s = 'aacawcabc'if re.search (rmags): line = re.search (rmags) print line.group ()

Results:

Abc

Note:

The difference between re.match () and re.search ():

Re.match only matches the beginning of the string. If the string does not match the regular expression at the beginning, the match fails, and the function returns None; and re.search matches the entire string until a match is found.

1.4re.sub function

The re.sub () function is used to replace matches.

Re.sub (pattern,repl,string,max = 0)

Pattern regular expression

Repl replacement

String matched by string

The default value of the maximum number of count substitutions is 0, which means that all matches are replaced

The returned string is replaced in the string with the leftmost non-repeating match of RE. If the pattern is not found, the character will be returned without change.

Program:

Import string,repattern ='\ d' repl = "!" S = 'abcdefg' line = re.sub (pattern,repl,s) print line

Results:

! abcdefg

1.5 regular expression modifier-optional Fla

Let's talk about what the symbol bit is:

Regular expressions can contain optional flag modifiers to control matching patterns. The modifier is specified as an optional flag. Multiple flags can be specified by bitwise OR (|) them. For example, re.I | re.M is set to I and M flags:

The modifier describes re.I to make the match case-insensitive re.L to do localized recognition (locale-aware) matching re.M multiline matching, affecting ^ and $re.S to make. Matches all characters, including line breaks. Re.U parses characters according to the Unicode character set. This flag affects\ w,\ W,\ b,\ B.re.X this flag allows you to write regular expressions more easily by giving you more flexible format.

Program:

Import string,repattern ='[Aa] [Bb] [Cc] [Dd]'s = 'AbCd' if re.match (pattern,s): line = re.match (pattern,s) print line.group ()

Results:

AbCd

The above program can be achieved by selecting flag bits:

Import string,repattern = 'abcd' s =' AbCd' if re.match (pattern,s,re.I): line = re.match (pattern,s,re.I) print line.group

The result

AbCd

1.6re.compile function

The general step in using re is to first use the re.compile () function to compile the string form of the regular expression into a Pattern instance, then use the Pattern instance to process the text and get the matching result (a Match instance), and finally use the Match instance to get the information and do something else.

Program:

Import string,repattern = re.compile ('\ dflowers') s = 'aabbccdd' if pattern.match (s): line = pattern.match (s) print line.group ()

Results:

11223344

These are all the contents of the article "what are the basic knowledge points of Python regular expressions". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.