What is a regular expression in linux 07/13 Update SLTechnology News&Howtos

What is a regular expression in linux

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly shows you "what is a regular expression in linux", the content is easy to understand, clear, hope to help you solve your doubts, let the editor lead you to study and learn "what is a regular expression in linux" this article.

What is a regular expression

A regular expression is to use a "string" to describe a feature, and then to verify that another "string" matches this feature. A simple example: use the string "a" to verify whether the string s is "a", such as s.match ("a"). Generally speaking, it has the following functions:

Verify that the string matches the specified characteristics, such as verifying that it is a valid e-mail address.

Finding a string, finding a string that matches a specified characteristic from a long text is more flexible than finding a fixed string.

Replacement strings are more flexible than normal string-based substitutions.

Basic rules

Ordinary character

Letters, numbers, Chinese characters, underscores, and subsequent punctuation marks without special definition are all "ordinary characters". An ordinary character in an expression that matches the same character when it matches a string. For example: the expression "a", when matching the string "abcde", the matching content is: "a".

Escape character

For some characters that are inconvenient to write, add "\" in front of them. Common ones such as:

\ r: enter

\ n: newline character

\ t: tabs

\: "\" itself

In addition, there are other punctuation marks of special use in re-regularization, which are preceded by "\" to represent the symbol itself. For example, ^, "characters, need to be written as"\ ^ "and" $". Such as

\ ^: match the ^ symbol itself

\ $: matches the $symbol itself

\. Matches the decimal point. Itself

The matching rules are the same as ordinary strings, such as "\ ^" matches "^" in "a ^ BC".

'multiple characters' match

\ d: any number, any one of 09s

\ w: any letter or number or underscore, that is, any letter or number, that is, any letter or number, that is, any letter or number or underscore, that is, any letter or number, that is, any letter or number, or underscore, that is, any letter or number, or an underscore, that is, any letter or number, or an underscore, that is, any letter or number, or an underscore, that is, any letter or number, or an underscore, that is, any letter or number, or an underscore, that is, any one of the letters, numbers, or underscores.

\ s: any one of the blank characters, including spaces, tabs, page feeds, etc.

.: the decimal point can match any character except the newline character

For example, the expression "\ dtest\ d" matches "1test2".

In addition to the "multiple characters" matching that comes with the rule, it can also be customized through square brackets [].

Use [] to contain a series of characters that can match any one of them.

If you include a series of characters with [^], you can match any character other than one of them.

For example, [123] matches "1" or "2" or "3"; [^ abc] matches any character other than "a", "b" or "c".

It should be noted here that when using [], only those that will change the meaning of the character group need to be escaped.

The backslash must be escaped

Square brackets must be escaped

"^" must be escaped in the first and "-"

Even special characters do not need to be escaped in other cases, such as:

[aeiou]

[$. * +? {} () |]

[abc123 -]

Matching times

"degree modifier" is placed after "modified regular expression" and can be matched multiple times. Such as:

{n}: the expression is repeated n times, for example: "\ d {2}" equals "\ d\ d".

{mdirection n}: the expression is repeated at least m times and up to n times, for example: "a {1pr 3}" can match "a" or "aa" or "aaa".

{m,}: the expression is repeated at least m times, for example: "\ d {2,}" can match "12", "123", "12345678".

Match the expression 0 or 1 times, which is equivalent to {0jin1}, for example: "a [b]?" Can match "a", "ab".

+: the expression appears at least once, which is equivalent to {1,}. For example, "a +" can match "a", "aa", "aaa".

*: the expression does not appear or appears any number of times, which is equivalent to {0,}. For example, "ab" can match "a", "ab", "abb".

Special symbol

^: matches the beginning of the string, does not match any characters, and here matches the beginning of each line if the (? M) pattern is used. For example, "^ aaa" does not match "xxxaaaxxx", it can match "aaaxxx".

$: matches the end of the string, does not match any characters, and here matches the end of each line if the (? M) pattern is used. For example, "aaa$" does not match "xxxaaaxxx", it can match "xxxaaa".

\ b: matches a word boundary, that is, the position between the word and the space, and does not match any characters. Similar to "^" and "$", it does not match any characters, but it requires it to be on the left and right sides of the match, with a "\ w" range on one side and a non-"\ w" range on the other. Such as: ".\ b." Matches @ an of @ @ abc.

In addition, there are some symbols that can affect the relationship between subexpressions within an expression:

|: the OR relationship between the left and right expressions, matching the left or right.

(): when the number of matches is modified, the expression in parentheses can be modified as a whole; when taking the matching result, the matching content of the expression in parentheses can be obtained separately. For example: "(ab\ s *) +" matches "ab ab ab" in "hi, ab ab ab".

Advanced rules

Greed and non-greed match

When using a special symbol that modifies the number of matches, there are several representations that enable the same expression to match different times, such as "{mforce n}", "{m,}", "?", "*", "+", depending on the string being matched. This kind of expression with indefinite number of repeated matches always matches as many times as possible in the matching process.

For example, the text "axxxaxxxa", "(a) (\ w +)", where "\ w +" matches "xxxaxxxa" and "(a) (\ w +) (a)" matches "xxxaxxx". Thus it can be seen that when matching, "\ w +" always matches as many characters as possible that conform to its rules.

Although in the second example, it does not match the last "a", it is also for the entire expression to match successfully. Similarly, expressions with "" and "{mrecoery n}" match as many as possible, with "?" Expressions are matched as much as possible when they can be matched but not matched. This matching principle is called the "greed" mode.

Non-greedy mode refers to the addition of a "?" after a special symbol that modifies the number of matches. Number, you can make the expressions with variable matching times match as little as possible, and make the expressions that can match but not match mismatch as much as possible.

This matching principle is also known as the "reluctant" mode. If the lack of matching will lead to the failure of the whole expression matching, similar to the greedy pattern, the non-greedy pattern will match a little more to make the whole expression match successfully. For example, the text "axxxaxxxa", "(a) (\ w +)", where "\ w +" matches only one "x".

Reverse reference

When an expression matches, the expression engine records the string matched by the expression contained in parentheses "()". When getting the matching result, the string matched by the expression contained in parentheses can be obtained separately. When you use some kind of boundary to find, and the content you want to get does not contain a boundary, you must use parentheses to specify the desired scope. For example, "(. *?)" That is, to get the contents inside the div tag.

Here, the string matched by the regular expression contained in parentheses can be used not only after the match is over, but also during the matching process. At the end of the expression, you can refer to the submatch in the preceding parentheses to match the string that has been matched. The reference method is "\" plus a number. "\ 1" refers to the string matched in the first pair of parentheses, "\ 2" refers to the string matched in the second pair of parentheses, and so on, and if one pair of parentheses contains another pair of parentheses, the outer parentheses are numbered first. In other words, which pair of left parentheses "(" comes first, then the pair will be numbered first.

For example: the expression "('|') (. *?) (\ 1)" when matching "'Hello'," World "", the matching result is: successful; the matching result is: "' Hello'". When you match the next one again, you can match "" World "".

Pre-search

As mentioned earlier, the "^", "$" and "\ b" characters have one thing in common: they do not match any characters themselves, but attach a condition to "both ends of the string" or "gaps between characters". Similarly, other mechanisms based on this principle are provided in the rule to implement pre-search.

Forward presearch: "(? = xxxxx)", "(?! xxxxx)"

Format: "(? = xxxxx)". In the matched string, it adds a condition to the "gap" or "both ends": the right side of the gap must be able to match the expression on this part of the xxxxx, without affecting the following expression to really match the characters after the gap. For example, when "Mac (? = book | air)" matches "Mac pro, Mac air", it will only match "Mac" in "Mac air".

Format: "(?! xxxxx)", on the right side of the gap, must not match this part of the expression xxxxx. For example, "hello (?!\ w)" matches hello "when matching the string" hello,helloworld ". Using" (?!\ w) "here has the same effect as using"\ b ".

Reverse pre-search: (?

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.