Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to implement efficient and useful regular expressions

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces how to achieve efficient and useful regular expressions, with a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, let the editor take you to understand it.

What is a regular expression?

A regular expression is a set of special text made up of letters and symbols, which can be used to find sentences in the format you want.

A regular expression is a pattern that matches the body string from left to right. The word "Regular expression" is a mouthful, and we often use the abbreviated terms "regex" or "regexp". Regular expressions can replace strings in text, validate forms, extract strings, and so on, from a base string based on a certain matching pattern.

Imagine you are writing an application, and then you want to set a user naming rule so that the user name contains characters, numbers, underscores and hyphens, and limits the number of characters so that the name does not look so ugly. We use the following regular expression to validate a user name:

The above regular expressions can accept john_doe, jo-hn_doe, and john12_as. But it doesn't match Jo because it contains uppercase letters and is too short.

1. Basic match

A regular expression is actually the format in which a search is performed, which is a combination of letters and numbers. For example: a regular expression the, which represents a rule: it starts with the letter t, followed by h, then e.

"the" = > The fat cat sat on the mat.

The regular expression 123 matches the string 123. It is compared with the input regular expression character by character.

Regular expressions are case-sensitive, so The does not match the.

"The" = > The fat cat sat on the mat.

Regular expressions are case-sensitive, so The does not match the. Exe.

Of course, you can also make case-insensitive / the/i. The I in I is case-insensitive, which will be described later.

two。 Metacharacter

Regular expressions mainly depend on metacharacters. Metacharacters do not represent their own literal meaning, they all have a special meaning. Some metacharacters have a special meaning when written in square brackets. Here is an introduction to some metacharacters:

Metacharacter description. The period matches any single character except the newline character. [] character type. Match any character in formula parentheses. [^] the type of negative character. Match any character except any character in square brackets * match > = 0 repeated characters before the * sign. + match > = 1 character before the repeated + sign. ? Marks? The previous character is optional. {ncentine m} matches the character between the num curly braces (n The car parked in the garage.

The full stop in square brackets indicates a full stop. Expression ar [.] Match ar. String

"ar [.]" = > A garage is a good place to park a car.

The editor added:

[] good alphabetical order, and all are single letters, jb51 is anything that includes j or b or 5 or 1 can be matched.

If you can only match jb51, you can only use (), (jb51 | baidu)

The full stop in square brackets indicates a full stop. Expression ar [.] Match ar. String

If it is not in [], you can use the escape character\.

For example, in the above rule, we can write ar\.

However, if many characters need to be escaped and are out of order, it is better to use [].

For example: [. / ^]

It's convenient, it's better to understand than to escape one by one.

2.2.1 negative character set

Generally speaking, ^ represents the beginning of a string, but when it is used at the beginning of a square bracket, it indicates that the character set is negative. For example, the expression [^ c] ar matches any character followed by ar, except c.

"[^ c] ar" = > The car parked in the garage.2.3 repeat times

Followed by metacharacter +, * or? That is used to specify the number of times to match subpatterns These metacharacters have different meanings in different situations.

No. 2.3.1 *

The characters that match before * appear more than or equal to 0 times. For example, the expression a* matches 0 or more characters that begin with a. The expression [amurz] * matches all strings in a line that begin with lowercase letters.

"[a murz] *" = > The car parked in the garage # 21.

* characters and. Character matching can match all characters. *. * is used in conjunction with symbols\ s that represent matching spaces, such as the expression\ s*cat\ s * matches cat strings that begin with 0 or more spaces and end with 0 or more spaces.

"\ s*cat\ s *" = > The fat cat sat on the concatenation.2.3.2 +

The + sign matches the character before the + sign appears > = 1 time. For example, the expression c. Roomt matches a string that begins with the first letter c and ends with t, followed by at least one character.

"c. Roomt" = > The fat cat sat on the mat.2.3.3? Number

Metacharacters in regular expressions? The characters marked before the symbol are optional, that is, they appear 0 or 1 times. For example, the expression [T]? he matches the strings he and The.

"[T] he" = > The car is parked in the garage. "[T]? he" = > The car is parked in the garage.2.4 {}

In a regular expression, {} is a quantifier, often used for the number of times a character or group of characters can be repeated. For example, the expression [0-9] {2jue 3} matches a minimum of 2 digits and a maximum of 3 digits of 09s.

"[0-9] {2J 3}" = > The number was 9.9997 but we rounded it off to 10.0.

We can omit the second parameter. For example, [0-9] {2,} matches at least two digits of 09s.

"[0-9] {2,}" = > The number was 9.9997 but we rounded it off to 10.0.

If the comma is also omitted, it indicates a fixed number of repeats. For example, [0-9] {3} matches 3 digits

"[0-9] {3}" = > The number was 9.9997 but we rounded it off to 10.0.2.5 (...) Character group

A character group is a group written on (.) The subpattern in the. For example, the {} mentioned earlier is used to indicate the specified number of occurrences of the previous character. However, if the character group is added before {}, it means that the characters in the whole group are repeated N times. For example, the expression (ab) * matches 0 or more ab in succession.

We can also use the or character in () to indicate or. For example, (c | g | p) ar matches car or gar or par.

"(c | g | p) ar" = > The car is parked in the garage.2.6 | or operator

Or operator means or, used as a condition for judgment.

For example, (T | t) he | car matches (T | t) he or car.

"(T | t) he | car" = > The car is parked in the garage.2.7 transcoding special characters

The backslash\ is used in the expression to transcode the character immediately following it. Used to specify {} [] /\ + *. $^ |? These special characters. If you want to match these special characters, precede them with a backslash\.

For example. Is used to match all characters except newline characters. If you want to match the sentence. It should be written as\. The following example? It's a selective match.

"(f | c | m) at\.?" = > The fat cat sat on the mat.2.8 anchor

In a regular expression, an anchor is used to match a string at the beginning or end of a specified string. ^ specify the beginning and $specify the end.

2.8.1 ^ sign

^ is used to check whether the matching string is at the beginning of the matched string.

For example, using the expression ^ an in abc yields the result a. But if you use ^ b, you won't get any results. Because it doesn't start with b in the string abc.

For example, ^ (T | t) he matches a string that begins with The or the.

"(T | t) he" = > The car is parked in the garage. "^ (T | t) he" = > The car is parked in the garage.2.8.2$ sign

Similarly to the ^ sign, the $sign is used to match whether the character is the last.

For example, (at\.) $matches to at. The string at the end.

"(at\.)" > The fat cat. Sat. On the mat. "(at\.) $" = > The fat cat. Sat. On the mat.3. Abbreviated character set

Regular expressions provide some common character set abbreviations. It is as follows:

Abbreviate the description. All characters except newline characters\ w match all alphanumeric characters, which is equivalent to [a-zA-Z0-9 _]\ W matches all non-alphanumeric characters, namely symbols, equivalent to: [^\ w]\ d matches numbers: [0-9]\ D matches non-digits: [^\ d]\ s matches all space characters Equivalent to: [\ t\ n\ f\ p {Z}]\ S matches all non-space characters: [^\ s]\ f matches a page feed\ nmatches a line feed\ r matches a carriage return\ t matches a tab\ v matches a vertical tab\ p matches CR/LF (equivalent to\ r\ n) to match DOS line Terminator 4. Zero width assertion (pre-and post-check)

Both pre-and post-assertions belong to non-captured clusters (no text is captured and no counting is made for combinators). The advance assertion is used to determine whether the matching format is before another determined format, and the matching result does not contain the determined format (only as a constraint).

For example, if we want to get all the numbers that follow the $sign, we can use the positive and post-assertion (? 4.1? =. Positive advance assertion

? =. The positive assertion indicates that the first part of the expression must be followed by? =. The expression defined.

The returned result contains only the first part of the expression that meets the matching criteria. Define a positive antecedent assertion to use (). Use a question mark and an equal sign inside the parentheses: (? =.).

The content of the positive advance assertion is written after the equal sign in parentheses. For example, the expression (T | t) he (? =\ sfat) matches The and the, and in parentheses we define the positive advance assertion (? =\ sfat), that is, The and the are followed by (space) fat.

"(T | t) he (? =\ sfat)" = > The fat cat sat on the mat.4.2?!... Negative antecedent claim

Negative antecedents?! Used to filter all matching results so that they do not follow the format defined in the assertion. The definition of positive antecedent assertion is the same as negative antecedent assertion, except that = replace with! That is, (?!).

The expression (T | t) he (?!\ sfat) matches The and the, and is not followed by (space) fat.

"(T | t) he (?!\ sfat)" = > The fat cat sat on the mat.4.3? Used to filter all matching results so that they do not follow the format defined in the assertion. For example, the expression (? Matches cat and is not preceded by The or the.

"(?

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report