Regular / extended expressions in linux (grep) 07/19 Update SLTechnology News&Howtos

Regular / extended expressions in linux (grep)

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Regular expressions in linux we often use regular expressions when we use shell scripts, so sort out some expressions that we will use in order to improve the ability of shell scripts.

Regular expressions (1)

Practice regular expressions through the grep command

For example, if we filter rows with bbb fields, we can get them directly through grep "xx" file.

[root@zhaocheng ~] # cat test1aaa bbb ooocccc dddd eeeefffff ggggg hhhhh bbbbbkkkkk pppppp ssssss xxxxxx mmmmmmmm [root@zhaocheng ~] # grep "bbb" test1aaa bbb ooofffff ggggg hhhhh bbbbb

For example, to filter out lines that start with aaa, we need to use ^ to match, where ^ represents a fixed line header.

You can add quotation marks or not.

[root@zhaocheng ~] # grep "^ aaa" test1aaa bbb ooo [root@zhaocheng ~] # grep'^ aaa' test1aaa bbb ooo [root@zhaocheng ~] # grep ^ aaa test1aaa bbb ooo

^ is the beginning of the matching line, and $is the end of the matching line. You can try it or leave the quotation marks unadded.

[root@zhaocheng ~] # grep mm$ test1kkkkk pppppp ssssss xxxxxx mmmmmmmm [root@zhaocheng ~] # grep 'mm$' test1kkkkk pppppp ssssss xxxxxx mmmmmmmm [root@zhaocheng ~] # grep "mm$" test1kkkkk pppppp ssssss xxxxxx mmmmmmmm

For example, if you match a word, you can directly use ^ xx$, that is, to match the beginning and end of a line, and grep-n to output the line number,-- color color.

[root@zhaocheng ~] # grep ^ today$ test1today [root@zhaocheng ~] # grep-n-- color ^ today$ test11:today [root@zhaocheng ~] # grep-n ^ today$ test11:today

So ^ $means a blank line, match a blank line, output a space on the fourth line, and match it with ^ $.

[root@zhaocheng ~] # grep ^ $test1 [root@zhaocheng ~] # grep-n ^ $test14:

To match the beginning or end of a word in grep

[root@zhaocheng ~] # grep "\" test1beijinG [root@zhaocheng ~] # grep "\" test1beijinG

You can also fix the prefix and suffix.

[root@zhaocheng] # grep-n-- color "\" test16:Beijing is beijin ya

In addition to\ fixed suffix, you can also use\ b or fixed suffix.

[root@zhaocheng ~] # grep-n-- color "\ bccc" test15:cccc dddd eeee [root@zhaocheng ~] # grep-n-- color "eee\ b" test15:cccc dddd eeee [root@zhaocheng ~] # grep-n-color "\ beeee\ b" test15:cccc dddd eeee

This b also has a brother B, this belongs to match "non-word boundary"

You can see that when you match the first bb, you match the word capital except the beginning of bb.

The following matches are matched except for the prefixes at the beginning of bbb.

[root@zhaocheng ~] # grep-n-- color "\ Bbb" test12:aaa bbb ooo7:fffff ggggg hhhhh bbbbb [root@zhaocheng ~] # grep-n-color "\ Bbbb" test17:fffff ggggg hhhhh bbbbb

Summary:

^: indicates a fixed beginning of a line, and anything after a word character must appear at the beginning of the line to match

$: indicates a fixed end of a line, and anything that precedes a word character must appear at the end of the line to match

^ $: indicates matching blank lines. The blank lines described here mean 'enter', while spaces or tab, etc., cannot be counted as blank lines described here.

^ abc$: means that when abc has an exclusive row, it will be matched to

\ or\ b: matches the word boundary, indicating a fixed suffix, and the character before it must appear as the end of the word

\ B: matches a non-word boundary, as opposed to\ b

Regular expressions (2)

Find out which lines in the text contain two consecutive letters image

For example, find out what words start with image in the yaml file.

[root@zhaocheng files] # grep-n "image" coredns.yaml 111: image: zhaocheng172/coredns:1.2.2112: imagePullPolicy: IfNotPresent

If there are a lot of words in a text, but you only want to match the same field in the word that contains two consecutive

[root@zhaocheng ~] # grep-n "b\ {2\}" test34:bb6:bbb [root@zhaocheng ~] # grep-n "a\ {2\}" test31:aa3:aaa aa aa8:aaiip9:aallo aahuy

The regular symbol of the number of matches, which we often use when matching any character.

Represents any character that matches any length

However, in a regular expression, the previous character appears any number of times in a row (including 0 times)

For example, matching ap, that is, a can appear any number of times, but it must be followed by p.

[root@zhaocheng] # grep-n "aquip" test37:appaly aoopa8:aaiip

* match o, that is, any character after o * *

[root@zhaocheng] # grep-n-- color "o*" test21:aaa#bbb#ooo2:cccc#dddd#eeee3:fffff#ggggg#hhhhh4:kkkkk#pppppp#ssssss

Represents characters of any length in wildcards and is used in regular expressions. To match.

[root@zhaocheng] # grep-n-- color "o.*" test11:today2:aaa bbb ooo

In a regular expression. It represents any single character after it. Is that any two characters will match to

[root@zhaocheng] # grep-n "y." Test6:sync:x:5:0:sync:/sbin:/bin/sync [root@zhaocheng ~] # grep-n "y.." Test6:sync:x:5:0:sync:/sbin:/bin/sync [root@zhaocheng ~] # grep-n "y..." Test6:sync:x:5:0:sync:/sbin:/bin/sync

Regular expressions (3) Common symbols

[[: alpha:]] matches all letters with letters

[root@zhaocheng ~] # grep "[[: alpha:]]" test4aa9oafghj9ghabcdaBDcabdDa124a1a4a%

Preceded by a, that is, a letter, the default

[root@zhaocheng ~] # grep "a [[: alpha:]]" test4afghj9ghabcdaBDcabdD

That is, it matches the three letters followed by a.

[root@zhaocheng ~] # grep "a [[: alpha:]]\ {3\}" test4afghj9ghabcdaBDcabdD

Match the two letters after a

[root@zhaocheng ~] # grep "a [[: alpha:]]\ {2\}" test4afghj9ghabcdaBDcabdD

For example, if all three characters must be lowercase

You can use [: lower:]] to represent any lowercase letter

[root@zhaocheng ~] # grep "a [[: lower:]]" test4afghj9ghabcdabdD [root@zhaocheng ~] # grep "a [[: lower:]]\ {2\}" test4afghj9ghabcdabdD

You can also use any uppercase letter

[[: upper:]] [root@zhaocheng ~] # grep "a [[: upper:]]\ {1\}" test4aBDc [root@zhaocheng ~] # grep "a [[: upper:]]\ {2\}" test4aBDc

Common symbols

[[: alpha:]] represents any uppercase and lowercase letter

[[: lower:]] represents any lowercase letter

[[: upper:]] indicates any uppercase letter

[[: digit:]] represents any single number between 0 and 9 (including 0 and 9)

[[: alnum:]] indicates any number or letter

[[: space:]] represents any white space character, including spaces, tab keys

[[: punct:]] indicates arbitrary punctuation

In addition to [[: lower:]] can represent lowercase letters, another "[amurz]" can also represent any lowercase letter, [[: lower:]] is the same as [amerz]

[root@zhaocheng ~] # grep "[a Murz]" test4aa9oafghj9ghabcdaBDcabdDa124a1a4a% [root@zhaocheng ~] # grep "[[: lower:]]" test4aa9oafghj9ghabcdaBDcabdDa124a1a4a%

The same capital case [A murz] and [[: upper:]] are the same.

[root@zhaocheng ~] # grep "[Amurz]" test4aBDcabdD [root@zhaocheng ~] # grep "[[: upper:]]" test4aBDcabdD

Use two methods to filter out the 2 characters after the letter a

[[: lower:]] lowercase

[[: upper:]] capitalization

[root@zhaocheng ~] # grep "a [a-z]\ {2\}" test4afghj9ghabcdapoooaiuhhabdD [root@zhaocheng ~] # grep "a [[: lower:]]\ {2\}" test4afghj9ghabcdapoooaiuhhabdD [root@zhaocheng ~] # grep "a [A-Z]\ {2\}" test4aBDc [root@zhaocheng ~] # grep "a [[: upper:]]\ {2\}" test4aBDc

There is another arbitrary character that is [: alpha:], which has the same meaning as [a-zA-Z].

[root@zhaocheng ~] # grep "[[: alpha:]]" test4aa9oafghj9ghabcdapoooaiuhhaBDcabdDa124a1a4a% [root@zhaocheng ~] # grep "[a-zA-Z]" test4aa9oafghj9ghabcdapoooaiuhhaBDcabdDa124a1a4a%

Similarly, [0-9] and [[: digit:]] are equivalent, and both represent any single number between 0-9

[root@zhaocheng ~] # grep "[[: digit:]]" test4a9oafghj9gha124a1a4 [root@zhaocheng ~] # grep "[0-9]" test4a9oafghj9gha124a1a4

The middle [a murz], that is, all lowercase letters, can also match the relevant characters.

[root@zhaocheng ~] # grep "b [ad]" test5babd

You can also match special characters. [] means to match any single character in the specified range.

[root@zhaocheng ~] # grep "b [cP@*&]" test5bcbPb&b*b@

Filter characters other than these symbols

[root@zhaocheng ~] # grep "b [^ cP@*&]" test5babdbfbg

Use [^ amurz] to exclude characters other than this.

[root@zhaocheng ~] # grep "b [^ Amurz]" test5babcbdbfbgb&b*b@ [root@zhaocheng ~] # grep "b [^ amerz]" test5bPb&b*b@

The same principle

[^ amurz] means that a single character that is not lowercase can be matched to the

[^ Amurz] indicates that a single character that is not uppercase can be matched to the

[^ a-zA-Z] means that a single character that is not alphabetic can be matched, such as a number or symbol

[^ a-zA-Z0-9] means that single characters that are not alphabetic or numeric can be matched, such as symbols

Previously, we tried to see that [a lower] and [[: Muz:]] are equivalent, so they are also equivalent in ^.

[root@zhaocheng ~] # grep "b [^ [: lower:]]" test5bPb&b*b@

That is, [^ 0-9] and [^ [: digit:]] are equivalent.

[^ amurz] and [^ [: lower:] are equivalent

[^ Amurz] and [^ [: upper:]] are equivalent.

[^ a-zA-Z] and [^ [: alpha:] are equivalent

[^ a-zA-Z0-9] and [^ [: alnum:] are equivalent

In addition to [0-9], [[: digit:]] can represent numbers, and you can also use\ d to represent numbers

[root@zhaocheng] # grep-P "b\ d" test5b3b4b5

Display any single non-numeric character

[root@zhaocheng] # grep-P "b\ D" test5babcbdbfbgbPb&b*b@

\ d represents any single 0-9 number

\ D represents any single non-numeric character

\ t means match a single horizontal tab (equivalent to a tab key)

\ s means to match a single white space character, including spaces, tab tabs, etc.

\ s means to match a single non-white space character

Fourth, regular expression escape character

Commonly used symbol "\", escape character

We used it before. Regular means to match any character that follows, but if there is this point in this text, if it matches directly, it will match to other characters, so here we need to use the escape character\ to match.

[root@zhaocheng ~] # grep "a.." Test4a9oafghj9ghabcdapoooaiuhhaBDcabdDa124a1a4a.. [root@zhaocheng ~] # grep "a\.\." Test4a..

If you want to match the backslash itself,

You can use''single quotation marks to match,'\'to match a

[root@zhaocheng ~] # grep'a\\ 'test4a\\ [root@zhaocheng ~] # grep'a\' test4a\\ [root@zhaocheng ~] # grep'a\ 'test4a\

Summary of regular expressions

# Common symbols #

. Represents any single character.

Indicates that the preceding character appears any number of times in a row, including 0.

. Represents any character of any length, with the same meaning as in wildcards.

\ represents an escape character and, when combined with a symbol in a regular expression, represents the symbol itself.

[] matches any single character within the specified range.

[^] matches any single character outside the specified range.

# single character matching related #

[[: alpha:]] represents any uppercase and lowercase letter.

[[: lower:]] represents any lowercase letter.

[[: upper:]] represents any uppercase letter.

[[: digit:]] represents any single number between 0 and 9 (inclusive).

[[: alnum:]] represents any number or letter.

[[: space:]] represents any white space character, including "space", "tab key", and so on.

[[: punct:]] indicates any punctuation mark.

[^ [: alpha:]] represents a single non-alphabetic character.

[^ [: lower:]] represents a single non-lowercase alphabetic character.

[^ [: upper:]] represents a single non-uppercase character.

[^ [: digit:]] represents a single non-numeric character.

[^ [: alnum:]] represents a single non-numeric, non-alphabetic character.

[^ [: space:]] represents a single non-white space character.

[^ [: punct:]] represents a single non-punctuation character.

[0-9] is equivalent to [[: digit:]].

[a murz] is equivalent to [[: lower:]]

[Amurz] is equivalent to [[: upper:]].

[a-zA-Z] is equivalent to [[: alpha:]].

[a-zA-Z0-9] is equivalent to [: alnum:].

[^ 0-9] is equivalent to [^ [: digit:]].

[^ aMuz] is equivalent to [^ [: lower:]].

[^ Amurz] is equivalent to [^ [: upper:]]

[^ a-zA-Z] is equivalent to [^ [: alpha:]]

[^ a-zA-Z0-9] is equivalent to [^ [: alnum:]]

The short format is not recognized by all regular expression parsers.

\ d represents any single number from 0 to 9.

\ D represents any single non-numeric character.

\ t means to match a single horizontal tab (equivalent to a tab key).

\ s means to match a single white space character, including "space", "tab tab", and so on.

\ s means to match a single non-white space character.

# number of times matching related #

\? Indicates that the character before it is matched 0 or 1 times

+ means that the character in front of it is matched at least once, or several times in a row, with no cap on the number of consecutive times.

{n} means that the preceding characters appear n times in a row and will be matched.

{xrecoery y} means that the previous characters can be matched at least x times in a row and at most y times in a row, in other words, as long as the previous characters appear continuously between x and y.

{, n} means that the previous characters will be matched at most n times or at least 0 times in a row.

{n,} means that the previous characters appear at least n times in a row before they are matched.

# location boundary matching related #

^: indicates that the beginning of the line is anchored, and anything after this character must appear at the beginning of the line to match.

$: indicates that the end of the line is anchored, and anything before this character must appear at the end of the line to match.

^ $: indicates a matching blank line. The blank line described here means "enter", and "space" or "tab" cannot be counted as the blank line described here.

^ abc$: means that when abc has an exclusive row, it will be matched.

\ or\ b: matches the word boundary, indicating that the suffix is anchored, and the character before it must appear as the end of the word.

\ B: matches a non-word boundary, which is the opposite of\ b.

# grouping and backward reference #

() represents a grouping, in which we can treat the contents as a whole, and the grouping can be nested.

(ab) means to treat ab as a whole.

\ 1 refers to the result of a regular match in the first grouping in the entire expression.

\ 2 refers to the result of a regular match in the second grouping in the entire expression.

[root@zhaocheng ~] # cat shengri.txt small card 19901119 Xiao Hong 19920105 Xiao Li 19930211 Xiao she 19940325 Xiao Hei 19950418 matches [root@zhaocheng ~] # grep "\ b1993 [0-9]\ {4\}\ b" shengri.txt Xiaoli 19930211

It means\,\ b is a fixed suffix.

[root@zhaocheng ~] # grep "\" shengri.txt Xiaohong 19920105 [root@zhaocheng ~] # grep "\" shengri.txt Xiaohong 19920105

VI. Extending regular expressions

Whether in basic regular expressions or extended regular expressions, some common symbols have the same meaning.

. Represents any single character.

Indicates that the preceding character appears any number of times in a row, including 0.

. Represents any character of any length, with the same meaning as in wildcards.

\ represents an escape character and, when combined with a symbol in a regular expression, represents the symbol itself.

[] matches any single character within the specified range.

[^] matches any single character outside the specified range.

[[: alpha:]] represents any uppercase and lowercase letter.

[[: lower:]] represents any lowercase letter.

[[: upper:]] represents any uppercase letter.

[[: digit:]] represents any single number between 0 and 9 (inclusive).

[[: alnum:]] represents any number or letter.

[[: space:]] represents any white space character, including "space", "tab key", and so on.

[[: punct:]] indicates any punctuation mark.

[^ [: alpha:]] represents a single non-alphabetic character.

[^ [: lower:]] represents a single non-lowercase alphabetic character.

[^ [: upper:]] represents a single non-uppercase character.

[^ [: digit:]] represents a single non-numeric character.

[^ [: alnum:]] represents a single non-numeric, non-alphabetic character.

[^ [: space:]] represents a single non-white space character.

[^ [: punct:]] represents a single non-punctuation character.

[0-9] is equivalent to [[: digit:]].

[a murz] is equivalent to [[: lower:]]

[Amurz] is equivalent to [[: upper:]].

[a-zA-Z] is equivalent to [[: alpha:]].

[a-zA-Z0-9] is equivalent to [: alnum:].

[^ 0-9] is equivalent to [^ [: digit:]].

[^ aMuz] is equivalent to [^ [: lower:]].

[^ Amurz] is equivalent to [^ [: upper:]]

[^ a-zA-Z] is equivalent to [^ [: alpha:]]

[^ a-zA-Z0-9] is equivalent to [^ [: alnum:]]

^: indicates that the beginning of the line is anchored, and anything after this character must appear at the beginning of the line to match.

$: indicates that the end of the line is anchored, and anything before this character must appear at the end of the line to match.

^ $: indicates a matching blank line. The blank line described here means "enter", and "space" or "tab" cannot be counted as the blank line described here.

^ abc$: means that when abc has an exclusive row, it will be matched.

\ or\ b: matches the word boundary, indicating that the suffix is anchored, and the character before it must appear as the end of the word.

\ B: matches a non-word boundary, which is the opposite of\ b.

The grep command only supports basic regular expressions by default. If you want grep to support extended regular expressions, you need to use the-E option, but these 70% are common symbols. See the effect.

[root@zhaocheng ~] # grep "b [a-z]" test5babcbd [root@zhaocheng ~] # egrep "b [a-z]" test5babcbd [root@zhaocheng ~] # grep-E "b [a-z]" test5babcbd

The other 30% is slightly different from the basic regular expression, but looks easier to understand than the regular

In a regular expression, {n} means that the preceding character appears n times in a row and will be matched to the

In an extended regular expression, {n} means that the preceding characters appear n times in a row and will match to the

In basic regular expressions, () indicates grouping, and (ab) indicates that ab is treated as a whole.

In extended regular expressions, () indicates grouping, and (ab) indicates that ab is treated as a whole.

In extended regular expressions:

() indicates grouping

(ab) means to treat ab as a whole

\ 1 indicates the result of the regular match that refers to the first grouping in the entire expression

\ 2 represents the result of a regular match that references the second grouping in the entire expression

? Indicates that the character before it is matched 0 or 1 times

Indicates that the character in front of it is matched at least once, or several times in a row, with no cap on the number of consecutive times

{n} means that the preceding characters appear n times in a row and will match to

{xrecoery y} means that the previous characters appear at least x times in a row, and a maximum of y times in a row, which can be matched to

{, n} means that the previous characters appear at most n times in a row, or at least 0 times, will match to

{n,} means that the previous characters appear at least n times in a row before they match to

There is also a more commonly used symbol in extended expressions, which is not found in basic regular expressions. It is "|".

It means to indicate or

[root@zhaocheng ~] # cat test6kubernetes.comjenkins.comrabbitmq.comzookpeer.comspring boot.comdubbo.eduspring cloud.nethelm.org

Find out what lines end in .net and what xxx$ ends with.

[root@zhaocheng ~] # grep "net$" test6spring cloud.net [root@zhaocheng ~] # egrep "net$" test6spring cloud.net [root@zhaocheng ~] # grep-E "net$" test6spring cloud.net

For example, you can use "|" to find lines that end like com,net, and you can use egrep or grep-E, () to show that the contents in parentheses are regarded as a whole.

[root@zhaocheng ~] # egrep "(com | net) $" test6kubernetes.comjenkins.comrabbitmq.comzookpeer.comspring boot.comspring cloud.net [root@zhaocheng ~] # grep-E "(com | net) $" test6kubernetes.comjenkins.comrabbitmq.comzookpeer.comspring boot.comspring cloud.net

It can also be written that way, but not as (com | net) $exactly. What does $end with?

[root@zhaocheng ~] # grep-E "com | net" test6kubernetes.comjenkins.comrabbitmq.comzookpeer.comspring boot.comspring cloud.net

Summary of common extended expressions

Common symbols

. Represents any single character.

Indicates that the preceding character appears any number of times in a row, including 0.

. Represents any character of any length, with the same meaning as in wildcards.

\ represents an escape character and, when combined with a symbol in a regular expression, represents the symbol itself.

| means "or". |

[] matches any single character within the specified range.

[^] matches any single character outside the specified range.

Single character matching correlation

[[: alpha:]] represents any uppercase and lowercase letter.

[[: lower:]] represents any lowercase letter.

[[: upper:]] represents any uppercase letter.

[[: digit:]] represents any single number between 0 and 9 (inclusive).

[[: alnum:]] represents any number or letter.

[[: space:]] represents any white space character, including "space", "tab key", and so on.

[[: punct:]] indicates any punctuation mark.

[^ [: alpha:]] represents a single non-alphabetic character.

[^ [: lower:]] represents a single non-lowercase alphabetic character.

[^ [: upper:]] represents a single non-uppercase character.

[^ [: digit:]] represents a single non-numeric character.

[^ [: alnum:]] represents a single non-numeric, non-alphabetic character.

[^ [: space:]] represents a single non-white space character.

[^ [: punct:]] represents a single non-punctuation character.

[0-9] is equivalent to [[: digit:]].

[a murz] is equivalent to [[: lower:]]

[Amurz] is equivalent to [[: upper:]].

[a-zA-Z] is equivalent to [[: alpha:]].

[a-zA-Z0-9] is equivalent to [: alnum:].

[^ 0-9] is equivalent to [^ [: digit:]].

[^ aMuz] is equivalent to [^ [: lower:]].

[^ Amurz] is equivalent to [^ [: upper:]]

[^ a-zA-Z] is equivalent to [^ [: alpha:]]

[^ a-zA-Z0-9] is equivalent to [^ [: alnum:]]

Times matching correlation

? Indicates that the character before it is matched 0 or 1 times

Indicates that the character in front of it is matched at least once, or several times in a row, with no cap on the number of consecutive times.

{n} means that the preceding characters appear n times in a row and will be matched.

{, n} means that the previous characters will be matched at most n times or at least 0 times in a row.

{n,} means that the previous characters appear at least n times in a row before they are matched.

Location boundary matching correlation

^: indicates that the beginning of the line is anchored, and anything after this character must appear at the beginning of the line to match.

$: indicates that the end of the line is anchored, and anything before this character must appear at the end of the line to match.

^ $: indicates a matching blank line. The blank line described here means "enter", and "space" or "tab" cannot be counted as the blank line described here.

^ abc$: means that when abc has an exclusive row, it will be matched.

\ or\ b: matches the word boundary, indicating that the suffix is anchored, and the character before it must appear as the end of the word.

\ B: matches a non-word boundary, which is the opposite of\ b.

Grouping and backward reference

() represents a grouping, in which we can treat the contents as a whole, and the grouping can be nested.

(ab) means to treat ab as a whole.

\ 1 refers to the result of a regular match in the first grouping in the entire expression.

\ 2 refers to the result of a regular match in the second grouping in the entire expression.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.