How to use regular expressions in linux 07/01 Update SLTechnology News&Howtos

How to use regular expressions in linux

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces how to use regular expressions in linux, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let Xiaobian take you to understand.

Brief introduction

Generally speaking, the grammar of regular expressions is divided into three standards: BRE, ERE, and ARE. Among them, BER and ERE belong to the POSIX standard, and ARE is an extension defined by each family.

POSIX regular expression

Traditionally, POSIX defines two kinds of regular expression syntax, namely: basic regular expression (BRE) and extended regular table.

ERE.

Among them, the syntax symbols defined by BRE include:

. -matches any character.

[]-the character set matches one of the character sets defined in parentheses.

[^]-the character set negates the match, matching characters not defined in square brackets.

^-match start position.

$- match end position.

\ (\)-defines the subexpression.

\ n-the subexpression is referenced forward, and n is a number between 1 and 9. Because this feature has gone beyond regular semantics, you need

To backtrack in a string, you need to use the NFA algorithm for matching.

*-any match (zero or multiple matches).

\ {m\}-at least m times, at most n times;\ {m\} means m exact matches;\ {m,\} means at least m

Second match.

ERE modifies some of the syntax in BRE and adds the following syntax symbols:

?-one match at most (zero or one match).

+-match at least once (one or more matches).

|-or operation, whose left and right operands can be a subexpression.

At the same time, ERE cancels the escape character reference syntax for the subexpression "()" and the degree matching "{mrecoery n}" syntax symbol, and references the syntax in

When using these two syntax symbols, you no longer need to add escape characters. At the same time, ERE also cancels the irregular semantic

The ability of subexpressions to refer forward.

BRE and ERE share the same POSIX character class definition. At the same time, they also support character class comparison operation "[..]"

And characters to the equivalent body "[=]" operation, but are rarely used.

Tools such as f / fr / wfr / bwfr use ERE mode by default and support the following perl-style character classes:

POSIX class perl class description

[: alnum:] letters and numbers

[: alpha:]\ a letter

[: lower:]\ l lowercase letters

[: upper:]\ u capital letters

[: blank:] White space characters (spaces and tabs)

[: space:]\ s all spaces (wider than [: blank:])

[: cntrl:] non-printable control characters (backspace, deletion, alarm...)

[: digit:]\ d decimal number

[: xdigit:]\ x hexadecimal number

[: graph:] printable non-white space characters

[: print:]\ p printable characters

[: punct:] punctuation mark

-in addition, there are the following special character classes:

Equivalent POSIX expression description of perl class

\ o [0-7] Octal numbers

\ O [^ 0-7] non-octal number

\ w [[: alnum:] _] words form characters

\ W [^ [: alnum:] _] non-word character

\ A [^ [: alpha:]] is not a letter

\ L [^ [: lower:]] non-lowercase letters

\ U [^ [: upper:]] non-uppercase letters

\ s [^ [: space:]] is not a space character

\ D [^ [: digit:]] is not numeric

\ X [^ [: xdigit:]] non-hexadecimal number

\ P [^ [: print:]] non-printable characters

-you can also use the following special character escape sequences:

\ r-enter

\ n-Line break

\ b-backspace

\ t-Tab character

\ v-Vertical Tab

\ "- double quotation marks

\'- single quotation marks

Advanced regular expression

In addition to POSIX BRE and ERE, libutilitis also supports advanced regular expressions compatible with TCL 8.2

Law (ARE). The stRegEx mode can be turned on by adding the prefix "*:" to the ARE parameter, which is overridden with this prefix

Cover the bExtended option. Basically, ARE is a superset of ERE. It does the following on the basis of ERE

Item extension:

1. Support for "lazy matching" (also known as "non-greedy matching" or "shortest matching"): in'?','*','+'or'{mrecoery n}'

Add'? 'after that. The symbol can enable the shortest matching so that the regular expression clause matches if the condition is met.

Match as few characters as possible (the default is to match as many characters as possible). For example: use "a.accounb" to act on "abab"

The entire string ("abab") will be matched, and if "a.roomroomb" is used, only the first two characters ("ab") will be matched.

two。 Support forward reference matching of subexpressions: in stRegEx, you can use'\ n'to forward reference previously defined

Subexpression. For example: "(a.*)\ 1" can match "abcabc" and so on.

3. Unnamed subexpression: use "(?: expression)" to create an unnamed expression, which does not return

To a'\ n' match.

4. Forward prediction: if you want to hit the match, you must meet the specified conditions ahead. Forward prediction can be divided into positive prediction and negative prediction.

Two kinds. The syntax for sure predictions is: "(? = expression)", for example: "bai.* (? = yang)" matches "bai yang"

The first four characters in ("bai"), but ensure that the string must contain "yang" after "bai.*" when matching.

The syntax for negative judgment is "(?! expression)", for example: "bai.* (?! yang)" matches the front of "bai shan".

Four characters, but the match ensures that the string does not appear "yang" after "bai.*".

5. Support mode switching prefix, which can be followed by "(? pattern string)" style pattern string, pattern

Strings affect the semantics and behavior of subsequent expressions. A pattern string can be a combination of the following characters:

B-switch to POSIX BRE mode and override the bExtended option.

E-switch to POSIX ERE mode and override the bExtended option.

Q-switch to literal text matching mode, where the characters in the expression are searched as text, canceling all regularities

Semantics. This pattern reduces regular matching to a simple string lookup. The "* *" prefix is its shortcut.

Mode, that is, "* *" equals "* *: (? Q)".

C-performs case-sensitive matching, overriding the bNoCase option.

I-performs a match that ignores case, overriding the bNoCase option.

N-turns on row sensitive matching:'^ 'and' $'match the beginning and end of the line;'.' And negative set ('[^.]') No

Matches the newline character. This function is equivalent to the 'pw' mode string. Override the bNewLine option.

M-equals'n'.

P -'^ 'and' $'only match the beginning and end of the whole string, not the lines;'. Does not match the newline character with the negative set.

Override the bNewLine option.

W -'^ 'and' $'match the beginning and end of the line;'.' Matches the newline character with the negative set. Override the bNewLine option.

S -'^ 'and' $'only match the beginning and end of the whole string, not the lines;'. Matches the newline character with the negative set. Reply

Cover the bNewLine option. This mode is used by default in the ARE state.

X-turn on extended mode: in extended mode, the blank character in the expression and the content after the comment character'# 'are ignored.

For example:

@ code@

(? X)

\ s + ([[: graph:]] +) # first number

\ s + ([[: graph:]] +) # second number

@ code@

Equivalent to "\ s + ([[: graph:]] +)\ s + ([[: graph:]] +)".

T-closes the extension mode and does not ignore the content after the blank and comment characters. This mode is used by default in the ARE state.

6. Perl-style character class commutation sequences that are different from BRE/ERE mode:

Equivalent POSIX expression description of perl class

\ a-ringing character

\ A-matches only the beginning of the entire string, regardless of the current mode

\ b-backspace character ('\ x08')

\ B-escape character itself ('\')

\ cX-control character-X (= X & 037)

\ d [[: digit:]] decimal digits ('0' -'9')

\ D [^ [: digit:]] is not numeric

\ e-exit character ('\ x1B')

\ f-Page feed character ('\ x0C')

\ m [[:]] word end position

\ n-newline character ('\ x0A')

\ r-carriage return ('\ x0D')

\ s [[: space:]] blank character

\ s [^ [: space:]] non-blank character

\ t-Tab ('\ x09')

\ uX-16-bit UNICODE character (X ∈ [0000.. FFFF])

\ UX-32-bit UNICODE character (X ∈ [00000000.. FFFFFFFF])

\ v-Vertical Tab ('\ x0B')

\ w [[: alnum:] _] the characters that make up the word

\ W [^ [: alnum:] _] non-word characters

\ xX-8-bit character (X ∈ [00.. FF])

\ y-word boundary (\ m or\ M)

\ y-non-word boundary

\ Z-matches only the end of the entire string, regardless of the current mode

\ 0-NULL, null character

\ X-subexpression forward reference (X ∈ [1.. 9])

\ XX-8 characters for forward reference or octal representation of a subexpression

\ XXX-8 characters for forward reference or octal representation of a subexpression

Thank you for reading this article carefully. I hope the article "how to use regular expressions in linux" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.