What are the ways to write Python regular expressions? 07/06 Update SLTechnology News&Howtos

What are the ways to write Python regular expressions?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "what are the ways to write Python regular expressions". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what are the ways to write Python regular expressions"?

What is a regular expression?

Regular expressions (Regular Expression) are often used to retrieve and replace text that conforms to a certain pattern (rule).

Regular here means rules, rules, and Regular Expression means an expression that describes a rule.

This paper collects some common regular expression usage, which is convenient for everyone to query and get, and a detailed regular expression grammar manual is attached at the end.

Examples include: "mailbox, ID number, mobile phone number, landline, domain name, IP address, date, zip code, password, Chinese character, number, string"

How does Python support regularization?

I use python to implement the rules and write code in Jupyter Notebook.

Python supports regular expressions through the re module, and the re module enables the Python language to have all the regular expression functions.

Note the use of two functions here:

Re.compile is used to compile regular expressions to generate a regular expression (Pattern) object

.findall is used to find all the substrings matched by the regular expression in a string and return a list, or an empty list if no matches are found.

# Import re module import re

1. Mailbox

Contains uppercase and lowercase letters, underscores, Arabic numerals, periods, and underscores

Expression:

[a-zA-Z0-9 percent -] + @ [a-zA-Z0-9 percent -] + (?:\. [a-zA-Z0-9 percent -] +)

Case study:

Pattern = re.compile (r "[a-zA-Z0-9 email -] + @ [a-zA-Z0-9 email -] + (?:\. [a-zA-Z0-9 email -] +)") strs ='my personal mailbox is zhuwjwh@outlook.com and my company mailbox is 123456@qq.org, would you please register?' Result = pattern.findall (strs) print (result)

['zhuwjwh@outlook.com',' 123456qq.org']

two。 ID card number

Xxxxxx yyyy MM dd 3750 18 bits

Region: [1-9]\ d {5}

Top two places of the year: (18 | 19 | ([23]\ d)) 1800-2399

The last two places of the year:\ d {2}

Month: (0 [1-9]) | (10 | 11 | 12))

Number of days: ([0-2] [1-9]) | 10 | 20 | 30 | 31) 29 + cannot be prohibited in leap years

Three-digit sequence code:\ d {3}

Two-digit sequence code:\ d {2}

Check code: [0-9Xx]

Expression:

[1-9]\ d {5} (18 | 19 | ([23]\ d))\ d {2} ((0 [1-9])) | (10 | 11 | 12)) ([0-2] [1-9]) | 10 | 20 | 30 | 31)\ d {3} [0-9Xx]

Case study:

Pattern = re.compile (r "[1-9]\ d {5} (?: 18 | 19 | (?: [23]\ d))\ d {2} (?: 0 [1-9]) | (: 10 | 11 | 12)) (: (: [0-2] [1-9]) | 10 | 20 | 30)\ d {3} [0-9Xx]") strs = 'Xiaoming's ID card number is 342623198910235163, cell phone number is 1398769211010' result = pattern.findall (strs) print (result)

['342623198910235163']

3. Domestic mobile phone number

The mobile phone numbers are all 11 digits, starting with 1, the second digit is usually 3, 5, 6, 7, 8, 9, and the remaining eight arbitrary digits

For example: 13987692110, 15610098778

Expression:

1 (3 | 4 | 5 | 6 | 7 | 8 | 9)\ d {9}

Case study:

Pattern = re.compile (r "1 [356789]\ d {9}") strs = 'Xiaoming's cell phone number is 13987692110, you call him' result = pattern.findall (strs) print (result) tomorrow.

['13987692110']

4. Domestic landline telephone

Area code 3 "4 digits, number 7" 8 digits

For example: 0511-1234567, 021-87654321

Expression:

\ d {3} -\ d {8} |\ d {4} -\ d {7}

Case study:

Pattern = re.compile (r "\ d {3} -\ d {8} |\ d {4} -\ d {7}") strs = '0511-1234567 is Xiaoming's home phone, his office phone is 021-87654321' result = pattern.findall (strs) print (result)

['0511-1234567,' 021-87654321']

5. domain name

Contains http:\\ or https:\\

Expression:

(?: (?: http:\ /\ /) | (?: https:\ /\ /))? (: [\ w] (?: [\ w\ -] {0Magne 61} [\ w])?) + [a-zA-Z] {2jue 6} (?:\ /)

Case study:

Pattern = re.compile (r "(?: (?: http:\ / /) | (?: https:\ /\ /))? (: [\ w] (?: [\ w\ -] {0print 61} [\ w])?) + [a-zA-Z] {2jue 6} (?:\ /)) strs = 'Python official website is https://www.python.org/' result = pattern.findall (strs) print (result)

['https://www.python.org/']

6. IP address

IP addresses are 32 bits long (a total of 2 ^ 32 IP addresses), divided into 4 segments of 8 bits each, represented by decimal numbers

The number range of each segment is 0255. the segments are separated by a period.

Expression:

(?: 25 [0-5] | 2 [0-4]\ d | [01]?\ d?\ d) {3} (?: 25 [0-5] | 2 [0-4]\ d | [01]?\ d?\ d)

Case study:

Pattern = re.compile (r "(?: 25 [0-5] | 2 [0-4]\ d | [01]?\ d?\ d)\.) {3} (?: 25 [0-5] | 2 [0-4]\ d | [01]?\ d?))") strs =''Please enter a valid IP address, illegal IP address and other characters will be filtered! After adding, deleting or changing the IP address, please save and close the notepad! 192.168.8.84 192.168.85 192.168.86 0.0.1 256.1.1.1 192.256.256 192.255.255.255 aa.bb.cc.dd''' result = pattern.findall (strs) print (result)

['192.168.8.84,' 192.168.8.85, '192.168.8.86,' 0.0.0.1, '56.1.1.1,' 192.255.255.255']

7. Date

Common date formats: yyyyMMdd, yyyy-MM-dd, yyyy/MM/dd, yyyy.MM.dd

Expression:

\ d {4} (?:-|\ /.)\ d {1Magne2} (?:-|\ / |.)\ d {1Magne2}

Case study:

Pattern = re.compile (r "\ d {4} (?:-|\ /.)\ d {1pattern 2} (?:-|\ /.)\ d {1Magne2}") strs = 'Today is 2020-12-20, today of last year is 2019.12.20, today of next year is 2021-12-20' result = pattern.findall (strs) print (result)

['2020-12-20', '2019.12.20', '2021-12-20']

8. Domestic postal code

The postal code of our country adopts four-level six-digit coding structure.

The first two digits represent provinces (municipalities directly under the Central Government and autonomous regions)

The third digit represents the postal area; the fourth digit represents the county (city)

The last two digits represent the delivery bureau (office).

Expression:

[1-9]\ d {5} (?\ d)

Case study:

Pattern = re.compile (r "[1-9]\ d {5} (?!\ d)") strs = 'Shanghai Jing'an District Postal Code is 200040' result = pattern.findall (strs) print (result)

['200040']

9. Password

Password (starts with a letter, is between 6 and 18 in length, and can only contain letters, numbers, and underscores)

Expression:

[a-zA-Z]\ w {5pm 17}

Strong password (begins with a letter, must contain a combination of uppercase and lowercase letters and numbers, cannot use special characters and is between 8 and 10 in length)

Expression:

[a-zA-Z] (? = .*\ d) (? = .* [a murz]) (? = .* [Amurz]). {8Jing 10}

Pattern = re.compile (r "[a-zA-Z]\ w {5pm 17}") strs = 'password: q123456roomabc' result = pattern.findall (strs) print (result)

['q123456roomabc']

Pattern = re.compile (r "[a-zA-Z] (? =. *\ d) (? =. * [Amurz]) (? =. * [Amurz]). {8pm 10}") strs = 'strong password: q123456ABc, weak password: q123456abc' result = pattern.findall (strs) print (result)

['q123456ABcpenol']

10. Chinese characters

Expression:

[\ u4e00 -\ u9fa5]

Case study:

Pattern = re.compile (r "[\ u4e00 -\ u9fa5]") strs = 'apple: Apple' result = pattern.findall (strs) print (result)

['apple', 'fruit']

11. Figures

Verification number: ^ [0-9] * $

Verify the n-digit number: ^\ d {n} $

Verify at least n digits: ^\ d {n,} $

Verify the number of mmurn digits: ^\ d {mcentine n} $

Verify zero and non-zero numbers: ^ (0 | [1-9] [0-9] *) $

Verify the positive real number with two decimal places: ^ [0-9] + (. [0-9] {2})? $

Verify a positive real number with 1-3 decimal places: ^ [0-9] + (. [0-9] {1Jue 3})? $

Verify non-zero positive integers: ^\ +? [1-9] [0-9] * $

Verify non-zero negative integers: ^\-[1-9] [0-9] * $

Verify that non-negative integers (positive integers + 0) ^\ dflowers $

Verify non-positive integer (negative integer + 0) ^ ((-\ d +) | (0 +)) $

Integer: ^ -?\ dbath $

Non-negative floating point number (positive floating point number + 0): ^\ d + (\.\ d +)? $

Positive floating point number ^ (([0-9] +. [0-9] * [1-9] [0-9] *) | ([0-9] * [1-9] [0-9] *. [0-9] +) | ([0-9] * [1-9] [0-9] *)) $

Non-positive floating point number (negative floating point number + 0) ^ ((-\ d + (\.\ d +)?) | (0 + (\ .0 +)?)) $

Negative floating point number ^ (- ([0-9] +\. [0-9] * [1-9] [0-9] *) | ([0-9] * [1-9] [0-9] *. [0-9] +) | ([0-9] * [1-9] [0-9] *)) $

Floating point number ^ (-?\ d +) (\.\ d +)? $

twelve。 String

English and numbers: ^ [A-Za-z0-9] + $or ^ [A-Za-z0-9] {4jue 40} $

All characters with a length of 3-20: ^. {3pm 20} $

A string of 26 letters: ^ [A-Za-z] + $

A string of 26 uppercase letters: ^ [Amurz] + $

A string of 26 lowercase letters: ^ [amurz] + $

A string of numbers and 26 letters: ^ [A-Za-z0-9] + $

A string consisting of numbers, 26 letters, or underscores: ^\ w {3} 20} $or ^\ w

Chinese, English, numbers including underscore: ^ [\ u4E00 -\ u9FA5A-Za-z0-9] + $

Chinese, English, numbers but excluding underscores and other symbols: ^ [\ u4E00 -\ u9FA5A-Za-z0-9] + $or ^ [\ u4E00 -\ u9FA5A-Za-z0-9] {2jue 20} $

You can enter characters such as ^% &',; =? $\ ": `[^% &',; =? $\ x22] +`

Prohibited from entering characters containing ~: [^ ~\ x22] +

Attached: detailed explanation of regular expression syntax

Character description\ marks the next character as a special character (File Format Escape, see this table), or a literal character (Identity Escape, with ^ $() * +?. [{| a total of 12), or a backward reference (backreferences), or an octal escape character. For example, "n" matches the character "n". "\ n" matches a newline character. The sequence "\" matches "\" and "(" matches "(". Match the start position of the input string $match the end position of the input string * match the previous subexpression zero or more times. For example, zo* can match "z", "zo" and "zoo". * is equivalent to {0,}. + matches the previous subexpression one or more times. For example, "zo+" matches "zo" and "zoo", but not "z". + is equivalent to {1,}. ? Matches the previous subexpression zero or once. For example, "do (es)?" Can match "do" and "does" in "does". ? It is equivalent to {0jue 1}. {n} n is a non-negative integer. Match the determined n times. For example, "o {2}" does not match "o" in "Bob", but can match two o in "food". {n,} n is a non-negative integer. Match at least n times. For example, "o {2,}" does not match "o" in "Bob", but does match all o in "foooood". "o {1,}" is equivalent to "o +". "o {0,}" is equivalent to "o *". {n ~ m} m and n are non-negative integers, in which n reverse negative pre-check is similar to positive negative pre-check, but in the opposite direction. For example, "(?" Can match "Windows" in "3.1Windows", but not "Windows" in "2000Windows". X\ | y is not enclosed in (), its scope is the entire regular expression. For example, "z\ | food" can match "z" or "food". "(?: Z\ | f) ood" matches "zood" or "food". [xyz] character set (character class). Matches any of the characters contained. For example, "[abc]" can match "a" in "plain". Special characters have only backslashes\ keep special meaning and are used to escape characters. Other special characters such as asterisks, plus signs, various parentheses, etc., are used as ordinary characters. The delimited character ^ indicates a collection of negative characters if it appears in the first place, and only as a normal character if it appears in the middle of a string. Hyphen-indicates a description of the character range if it appears in the middle of a string, or as a normal character if it appears at the first (or end). The right square bracket should be escaped or can be used as the first character. [^ xyz] excludes the collection of typed characters (negated character classes). Matches any characters not listed. For example, "[^ abc]" can match "plin" in "plain". [amurz] character range. Matches any character within the specified range. For example, "[a murz]" can match any lowercase character in the range of "a" to "z". [^ amurz] the range of characters for exclusion. Matches any character that is not within the specified range. For example, "[^ amurz]" can match any character that is not in the range of "a" to "z". [: name:] add characters from the named character class (named character class) to the expression. Can only be used for square bracket expressions. [= elt=] adds the collate under the current locale equivalent to the character "elt". For example, [= a =] may add ä, á, à, Austria, Thailand, â, â, Austria, å, á, ä, ã o, Austria, Austria, Spain, Austria, Spain, Thailand, Spain. Can only be used for square bracket expressions. [.elt.] Add the sort element elt to the expression. This is because some sort elements are made up of multiple characters. For example, in Spanish with 29 alphabets, "CH" comes after the letter C as a single letter, resulting in the sort "cinco, credo, chispa". Can only be used for square bracket expressions. \ b matches a word boundary, that is, the position between the word and the space. For example, "er\ b" can match "er" in "never", but not "er" in "verb". \ B matches non-word boundaries. "er\ B" matches "er" in "verb", but not "er" in "never". \ cx matches the control characters indicated by x. The value of x must be one of Amurz or aMuz. Otherwise, c is treated as a literal "c" character. The value of the control character is equal to a minimum of 5 bits of the value of x (that is, the remainder of 3210). For example,\ cM matches a Control-M or carriage return. \ ca is equivalent to\ u0001,\ cb is equivalent to\ u0002, and so on. \ d matches a numeric character. Equivalent to [0-9]. Note that the Unicode regular expression matches full-width numeric characters. \ D matches a non-numeric character. Equivalent to [^ 0-9]. \ f matches a feed character. Equivalent to\ x0c and\ cL. \ nmatches a newline character. Equivalent to\ x0a and\ cJ. \ r matches a carriage return. Equivalent to\ x0d and\ cM. \ s matches any white space characters, including spaces, tabs, page breaks, and so on. Equivalent to [\ f\ n\ r\ t\ v]. Note that the Unicode regular expression matches the full-width space. \ s matches any non-white space character. Equivalent to [^\ f\ n\ r\ t\ v]. \ t matches a tab. Equivalent to\ x09 and\ cI. \ v matches a vertical tab. Equivalent to\ x0b and\ cK. \ w matches any word characters that include underscores. Equivalent to "[A-Za-z0-9]". Note that Unicode regular expressions match Chinese characters. \ W matches any non-word characters. Equivalent to "[^ A-Za-z0-9]". \ xnn hexadecimal escape character sequence. Matches the characters represented by two hexadecimal digits nn. For example, "\ x41" matches "A". "\ x041" is equivalent to "\ x041". ASCII encoding can be used in regular expressions. .\ num back-reference a substring (substring) that matches the num-th capture group subexpression (subexpression) of the regular expression enclosed in parentheses. Where num is a decimal positive integer starting at 1, and its upper limit may be 9, 31, 99, or even infinite. For example: "(.)\ 1" matches two consecutive identical characters. \ nidentifies an octal escape value or a backward reference. If there are at least n previous acquired subexpressions, n is a backward reference. Otherwise, if n is an octal number (0-7), n is an octal escape value. \ nm3 octal number that identifies an octal escape value or a backward reference. If there are at least nm acquired subexpressions before\ nm, nm is a backward reference. If there are at least n fetches before\ nm, n is a backward reference followed by the text m. If none of the previous conditions are met, if both n and m are octal numbers (0-7),\ nm will match the octal escape value nm. \ nml if n is an octal number (0-3) and m and l are both octal numbers (0-7), the octal escape value nml is matched. \ unUnicode escape character sequence. Where n is a Unicode character represented by four hexadecimal digits. For example,\ u00A9 matches the copyright symbol (©).

Priority

The priority symbol is highest, (?), (?), (? =), in [], *, +,?, {n}, {n,}, {n, m} low ^, $, and the lowest concatenation of intermediary characters, that is, the lowest concatenation of adjacent characters, that is, the lowest concatenation of adjacent characters.\ | Thank you for your reading. This is the content of "what are Python regular expressions written?" after the study of this article. I believe that you have a deeper understanding of the problem of writing Python regular expressions, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.