Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the knowledge points of web regular expression

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what are the knowledge points of web regular expressions". In the daily operation, I believe that many people have doubts about the knowledge points of web regular expressions. I have consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the questions of "what are the knowledge points of web regular expressions?" Next, please follow the editor to study!

Definition of regular expression (regex)

Regular expression is an expression composed of a set of special characters and formats to operate on a string, which matches the target string from left to right in turn to achieve the purpose of retrieval, filtering and extraction.

The function of regular expression

Determine whether the user input meets the requirements.

Compare the content entered by the user with the predefined expression, and prompt the user to re-enter if it does not meet the requirements.

Constraining the input data through expressions can improve program efficiency and reduce the pressure on the server.

Common cases such as form validation.

Get all the contents that match the expression from the file.

Can efficiently find the content that conforms to the expression from a large string.

Common cases such as log analysis, crawlers and so on.

Metacharacter

[] A pair of parentheses represents a character group and is used to describe a rule by which characters can be matched.

[^] non-character group, used to describe a rule that characters do not match.

\ can be used to cancel the special meaning of metacharacters by adding\ before metacharacters.

\ d matches the number (lowercase d).

\ D matches non-digits (uppercase D).

\ w matches the numeric underscore (lowercase w).

\ W matches non-numeric underscores (uppercase W).

\ s matches spaces, tabs, newline characters (lowercase s).

\ s matches a non-blank character (uppercase S).

\ t match the table character (lowercase t).

\ nmatches the newline character (lowercase n).

. Matches everything except newline characters (English full stop.).

[\ d\ D], [\ w\ W], [\ s\ S] match all characters.

^ matches the beginning of the entire string, which can only be written at the beginning, not in the middle or after it (defined as "no" in the character group).

$matches the end of the entire string and can only be written at the end, not in the middle or before it.

| or, for example, a | b means to match the content of an or b. If the match a succeeds, it will not continue to match b (so always put the long expression first).

() grouping, constraining the scope of action, which has a special use in the re module of python.

\ b matches the beginning or end of a word, for example,\ bw matches the wline o\ b in hello world matches the o in hello.

Character group

Describes all the possibilities that can occur in one position (a pair of square brackets denotes only one character position). The examples are as follows:

[abc] matches an or b or c.

Range matching, can describe multiple ranges, can be connected to write.

[0-9] the matching number is 0-9, and the ascii code is 48-57.

[Amurz] matches the capital letter Amurz, and ascii codes 65-90 can match.

[amurz] matches lowercase Amurz letters and ascii codes 97-122.

[a-zA-Z] matches upper and lowercase letters. If you use [Amurz] to match uppercase and lowercase letters, you can also match the six symbols [\] ^ _ `.

[0-9]->\ d indicates the matching number digit.

[0-9aMuz Amurz]->\ w matches numbers, letters, and underscores word.

The space character (space | tab | enter)-> (|\ t |\ n)->\ s indicates all space characters.

Quantifier

Quantifiers are used to constrain the number of metacharacters that take effect. Quantifiers can only constrain the unit in front of them, which can be characters, character groups, or groupings.

{n} means matching n times

{n,} means to match at least n times.

{n ·m} means to match at least n times and m times at most.

? Indicates that there are 0 or 1 matches {0pm 1}

+ means to match one or more times {1,}

* indicates 0 or more times {0,}

About? + * these three symbols are often used. Please refer to the figure below and be sure to memorize their scope.

Match pattern (default maximum match)

Maximum match (also known as greedy match)

Match as much content as possible within the range of quantifiers. For example:. * x means to match any character any number of times and stop only when you encounter the last x.

Case study:

Expression:\ d {3,} 6

Target: 1234789135661947678914

Result: 12347891356619476 (according to the backtracking algorithm, match the number from 123 until the end of non-numeric or end, and then look for 6)

Minimum matching (also known as inert matching)

Match as little content as possible within the range of quantifiers, followed by a quantifier. Represents the smallest match. For example:. *? X means to match any character any number of times and stop when you encounter the first x.

Case study:

Expression:\ d {3,}? 6

Target: 1234789135661947678914

Result: 12347891356 (according to the inert matching principle, the numbers are matched all the time starting from 123, and the matching stops as soon as 6 is encountered)

Note: 2? The situation together, the previous one? It is a quantifier, meaning to match 0 or 1 times; the latter? Represents the smallest match. For example:

1\ dflowers 3 can match all of 13, 123, 133; 1\ dflowers 3 can match 13, 123, 13 of 13, 123, 133 (133 cannot be matched, according to the principle of inertia, the number 1 is optional, then find 3,)

Escape character

If you need to escape the meaning of the metacharacter itself, you can add\ before the metacharacter.

In addition, some metacharacters with special meaning, put in the character group, will cancel its special meaning. For example:

[(). * +?] These symbols in square brackets remove their special meaning from the character group.

[a\-c]-represents the range in the character group, and if you do not want it to represent the range, you need to escape or put it at the front or end of the character group.

At this point, the study of "what are the knowledge points of web regular expressions" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report