In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/03 Report--
Regular expressions in linux we often use regular expressions when we use shell scripts, so sort out some expressions that we will use in order to improve the ability of shell scripts.
Regular expressions (1)
Practice regular expressions through the grep command
For example, if we filter rows with bbb fields, we can get them directly through grep "xx" file.
[root@zhaocheng ~] # cat test1aaa bbb ooocccc dddd eeeefffff ggggg hhhhh bbbbbkkkkk pppppp ssssss xxxxxx mmmmmmmm [root@zhaocheng ~] # grep "bbb" test1aaa bbb ooofffff ggggg hhhhh bbbbb
For example, to filter out lines that start with aaa, we need to use ^ to match, where ^ represents a fixed line header.
You can add quotation marks or not.
[root@zhaocheng ~] # grep "^ aaa" test1aaa bbb ooo [root@zhaocheng ~] # grep'^ aaa' test1aaa bbb ooo [root@zhaocheng ~] # grep ^ aaa test1aaa bbb ooo
^ is the beginning of the matching line, and $is the end of the matching line. You can try it or leave the quotation marks unadded.
[root@zhaocheng ~] # grep mm$ test1kkkkk pppppp ssssss xxxxxx mmmmmmmm [root@zhaocheng ~] # grep 'mm$' test1kkkkk pppppp ssssss xxxxxx mmmmmmmm [root@zhaocheng ~] # grep "mm$" test1kkkkk pppppp ssssss xxxxxx mmmmmmmm
For example, if you match a word, you can directly use ^ xx$, that is, to match the beginning and end of a line, and grep-n to output the line number,-- color color.
[root@zhaocheng ~] # grep ^ today$ test1today [root@zhaocheng ~] # grep-n-- color ^ today$ test11:today [root@zhaocheng ~] # grep-n ^ today$ test11:today
So ^ $means a blank line, match a blank line, output a space on the fourth line, and match it with ^ $.
[root@zhaocheng ~] # grep ^ $test1 [root@zhaocheng ~] # grep-n ^ $test14:
To match the beginning or end of a word in grep
[root@zhaocheng ~] # grep "\" test1beijinG [root@zhaocheng ~] # grep "\" test1beijinG
You can also fix the prefix and suffix.
[root@zhaocheng] # grep-n-- color "\" test16:Beijing is beijin ya
In addition to\ fixed suffix, you can also use\ b or fixed suffix.
[root@zhaocheng ~] # grep-n-- color "\ bccc" test15:cccc dddd eeee [root@zhaocheng ~] # grep-n-- color "eee\ b" test15:cccc dddd eeee [root@zhaocheng ~] # grep-n-color "\ beeee\ b" test15:cccc dddd eeee
This b also has a brother B, this belongs to match "non-word boundary"
You can see that when you match the first bb, you match the word capital except the beginning of bb.
The following matches are matched except for the prefixes at the beginning of bbb.
[root@zhaocheng ~] # grep-n-- color "\ Bbb" test12:aaa bbb ooo7:fffff ggggg hhhhh bbbbb [root@zhaocheng ~] # grep-n-color "\ Bbbb" test17:fffff ggggg hhhhh bbbbb
Summary:
^: indicates a fixed beginning of a line, and anything after a word character must appear at the beginning of the line to match
$: indicates a fixed end of a line, and anything that precedes a word character must appear at the end of the line to match
^ $: indicates matching blank lines. The blank lines described here mean 'enter', while spaces or tab, etc., cannot be counted as blank lines described here.
^ abc$: means that when abc has an exclusive row, it will be matched to
\ or\ b: matches the word boundary, indicating a fixed suffix, and the character before it must appear as the end of the word
\ B: matches a non-word boundary, as opposed to\ b
Regular expressions (2)
Find out which lines in the text contain two consecutive letters image
For example, find out what words start with image in the yaml file.
[root@zhaocheng files] # grep-n "image" coredns.yaml 111: image: zhaocheng172/coredns:1.2.2112: imagePullPolicy: IfNotPresent
If there are a lot of words in a text, but you only want to match the same field in the word that contains two consecutive
[root@zhaocheng ~] # grep-n "b\ {2\}" test34:bb6:bbb [root@zhaocheng ~] # grep-n "a\ {2\}" test31:aa3:aaa aa aa8:aaiip9:aallo aahuy
The regular symbol of the number of matches, which we often use when matching any character.
Represents any character that matches any length
However, in a regular expression, the previous character appears any number of times in a row (including 0 times)
For example, matching ap, that is, a can appear any number of times, but it must be followed by p.
[root@zhaocheng] # grep-n "aquip" test37:appaly aoopa8:aaiip
* match o, that is, any character after o * *
[root@zhaocheng] # grep-n-- color "o*" test21:aaa#bbb#ooo2:cccc#dddd#eeee3:fffff#ggggg#hhhhh4:kkkkk#pppppp#ssssss
Represents characters of any length in wildcards and is used in regular expressions. To match.
[root@zhaocheng] # grep-n-- color "o.*" test11:today2:aaa bbb ooo
In a regular expression. It represents any single character after it. Is that any two characters will match to
[root@zhaocheng] # grep-n "y." Test6:sync:x:5:0:sync:/sbin:/bin/sync [root@zhaocheng ~] # grep-n "y.." Test6:sync:x:5:0:sync:/sbin:/bin/sync [root@zhaocheng ~] # grep-n "y..." Test6:sync:x:5:0:sync:/sbin:/bin/sync
Regular expressions (3) Common symbols
[[: alpha:]] matches all letters with letters
[root@zhaocheng ~] # grep "[[: alpha:]]" test4aa9oafghj9ghabcdaBDcabdDa124a1a4a%
Preceded by a, that is, a letter, the default
[root@zhaocheng ~] # grep "a [[: alpha:]]" test4afghj9ghabcdaBDcabdD
That is, it matches the three letters followed by a.
[root@zhaocheng ~] # grep "a [[: alpha:]]\ {3\}" test4afghj9ghabcdaBDcabdD
Match the two letters after a
[root@zhaocheng ~] # grep "a [[: alpha:]]\ {2\}" test4afghj9ghabcdaBDcabdD
For example, if all three characters must be lowercase
You can use [: lower:]] to represent any lowercase letter
[root@zhaocheng ~] # grep "a [[: lower:]]" test4afghj9ghabcdabdD [root@zhaocheng ~] # grep "a [[: lower:]]\ {2\}" test4afghj9ghabcdabdD
You can also use any uppercase letter
[[: upper:]] [root@zhaocheng ~] # grep "a [[: upper:]]\ {1\}" test4aBDc [root@zhaocheng ~] # grep "a [[: upper:]]\ {2\}" test4aBDc
Common symbols
[[: alpha:]] represents any uppercase and lowercase letter
[[: lower:]] represents any lowercase letter
[[: upper:]] indicates any uppercase letter
[[: digit:]] represents any single number between 0 and 9 (including 0 and 9)
[[: alnum:]] indicates any number or letter
[[: space:]] represents any white space character, including spaces, tab keys
[[: punct:]] indicates arbitrary punctuation
In addition to [[: lower:]] can represent lowercase letters, another "[amurz]" can also represent any lowercase letter, [[: lower:]] is the same as [amerz]
[root@zhaocheng ~] # grep "[a Murz]" test4aa9oafghj9ghabcdaBDcabdDa124a1a4a% [root@zhaocheng ~] # grep "[[: lower:]]" test4aa9oafghj9ghabcdaBDcabdDa124a1a4a%
The same capital case [A murz] and [[: upper:]] are the same.
[root@zhaocheng ~] # grep "[Amurz]" test4aBDcabdD [root@zhaocheng ~] # grep "[[: upper:]]" test4aBDcabdD
Use two methods to filter out the 2 characters after the letter a
[[: lower:]] lowercase
[[: upper:]] capitalization
[root@zhaocheng ~] # grep "a [a-z]\ {2\}" test4afghj9ghabcdapoooaiuhhabdD [root@zhaocheng ~] # grep "a [[: lower:]]\ {2\}" test4afghj9ghabcdapoooaiuhhabdD [root@zhaocheng ~] # grep "a [A-Z]\ {2\}" test4aBDc [root@zhaocheng ~] # grep "a [[: upper:]]\ {2\}" test4aBDc
There is another arbitrary character that is [: alpha:], which has the same meaning as [a-zA-Z].
[root@zhaocheng ~] # grep "[[: alpha:]]" test4aa9oafghj9ghabcdapoooaiuhhaBDcabdDa124a1a4a% [root@zhaocheng ~] # grep "[a-zA-Z]" test4aa9oafghj9ghabcdapoooaiuhhaBDcabdDa124a1a4a%
Similarly, [0-9] and [[: digit:]] are equivalent, and both represent any single number between 0-9
[root@zhaocheng ~] # grep "[[: digit:]]" test4a9oafghj9gha124a1a4 [root@zhaocheng ~] # grep "[0-9]" test4a9oafghj9gha124a1a4
The middle [a murz], that is, all lowercase letters, can also match the relevant characters.
[root@zhaocheng ~] # grep "b [ad]" test5babd
You can also match special characters. [] means to match any single character in the specified range.
[root@zhaocheng ~] # grep "b [cP@*&]" test5bcbPb&b*b@
Filter characters other than these symbols
[root@zhaocheng ~] # grep "b [^ cP@*&]" test5babdbfbg
Use [^ amurz] to exclude characters other than this.
[root@zhaocheng ~] # grep "b [^ Amurz]" test5babcbdbfbgb&b*b@ [root@zhaocheng ~] # grep "b [^ amerz]" test5bPb&b*b@
The same principle
[^ amurz] means that a single character that is not lowercase can be matched to the
[^ Amurz] indicates that a single character that is not uppercase can be matched to the
[^ a-zA-Z] means that a single character that is not alphabetic can be matched, such as a number or symbol
[^ a-zA-Z0-9] means that single characters that are not alphabetic or numeric can be matched, such as symbols
Previously, we tried to see that [a lower] and [[: Muz:]] are equivalent, so they are also equivalent in ^.
[root@zhaocheng ~] # grep "b [^ [: lower:]]" test5bPb&b*b@
That is, [^ 0-9] and [^ [: digit:]] are equivalent.
[^ amurz] and [^ [: lower:] are equivalent
[^ Amurz] and [^ [: upper:]] are equivalent.
[^ a-zA-Z] and [^ [: alpha:] are equivalent
[^ a-zA-Z0-9] and [^ [: alnum:] are equivalent
In addition to [0-9], [[: digit:]] can represent numbers, and you can also use\ d to represent numbers
[root@zhaocheng] # grep-P "b\ d" test5b3b4b5
Display any single non-numeric character
[root@zhaocheng] # grep-P "b\ D" test5babcbdbfbgbPb&b*b@
\ d represents any single 0-9 number
\ D represents any single non-numeric character
\ t means match a single horizontal tab (equivalent to a tab key)
\ s means to match a single white space character, including spaces, tab tabs, etc.
\ s means to match a single non-white space character
Fourth, regular expression escape character
Commonly used symbol "\", escape character
We used it before. Regular means to match any character that follows, but if there is this point in this text, if it matches directly, it will match to other characters, so here we need to use the escape character\ to match.
[root@zhaocheng ~] # grep "a.." Test4a9oafghj9ghabcdapoooaiuhhaBDcabdDa124a1a4a.. [root@zhaocheng ~] # grep "a\.\." Test4a..
If you want to match the backslash itself,
You can use''single quotation marks to match,'\'to match a
[root@zhaocheng ~] # grep'a\\ 'test4a\\ [root@zhaocheng ~] # grep'a\' test4a\\ [root@zhaocheng ~] # grep'a\ 'test4a\
Summary of regular expressions
# Common symbols #
. Represents any single character.
Indicates that the preceding character appears any number of times in a row, including 0.
. Represents any character of any length, with the same meaning as in wildcards.
\ represents an escape character and, when combined with a symbol in a regular expression, represents the symbol itself.
[] matches any single character within the specified range.
[^] matches any single character outside the specified range.
# single character matching related #
[[: alpha:]] represents any uppercase and lowercase letter.
[[: lower:]] represents any lowercase letter.
[[: upper:]] represents any uppercase letter.
[[: digit:]] represents any single number between 0 and 9 (inclusive).
[[: alnum:]] represents any number or letter.
[[: space:]] represents any white space character, including "space", "tab key", and so on.
[[: punct:]] indicates any punctuation mark.
[^ [: alpha:]] represents a single non-alphabetic character.
[^ [: lower:]] represents a single non-lowercase alphabetic character.
[^ [: upper:]] represents a single non-uppercase character.
[^ [: digit:]] represents a single non-numeric character.
[^ [: alnum:]] represents a single non-numeric, non-alphabetic character.
[^ [: space:]] represents a single non-white space character.
[^ [: punct:]] represents a single non-punctuation character.
[0-9] is equivalent to [[: digit:]].
[a murz] is equivalent to [[: lower:]]
[Amurz] is equivalent to [[: upper:]].
[a-zA-Z] is equivalent to [[: alpha:]].
[a-zA-Z0-9] is equivalent to [: alnum:].
[^ 0-9] is equivalent to [^ [: digit:]].
[^ aMuz] is equivalent to [^ [: lower:]].
[^ Amurz] is equivalent to [^ [: upper:]]
[^ a-zA-Z] is equivalent to [^ [: alpha:]]
[^ a-zA-Z0-9] is equivalent to [^ [: alnum:]]
The short format is not recognized by all regular expression parsers.
\ d represents any single number from 0 to 9.
\ D represents any single non-numeric character.
\ t means to match a single horizontal tab (equivalent to a tab key).
\ s means to match a single white space character, including "space", "tab tab", and so on.
\ s means to match a single non-white space character.
# number of times matching related #
\? Indicates that the character before it is matched 0 or 1 times
+ means that the character in front of it is matched at least once, or several times in a row, with no cap on the number of consecutive times.
{n} means that the preceding characters appear n times in a row and will be matched.
{xrecoery y} means that the previous characters can be matched at least x times in a row and at most y times in a row, in other words, as long as the previous characters appear continuously between x and y.
{, n} means that the previous characters will be matched at most n times or at least 0 times in a row.
{n,} means that the previous characters appear at least n times in a row before they are matched.
# location boundary matching related #
^: indicates that the beginning of the line is anchored, and anything after this character must appear at the beginning of the line to match.
$: indicates that the end of the line is anchored, and anything before this character must appear at the end of the line to match.
^ $: indicates a matching blank line. The blank line described here means "enter", and "space" or "tab" cannot be counted as the blank line described here.
^ abc$: means that when abc has an exclusive row, it will be matched.
\ or\ b: matches the word boundary, indicating that the suffix is anchored, and the character before it must appear as the end of the word.
\ B: matches a non-word boundary, which is the opposite of\ b.
# grouping and backward reference #
() represents a grouping, in which we can treat the contents as a whole, and the grouping can be nested.
(ab) means to treat ab as a whole.
\ 1 refers to the result of a regular match in the first grouping in the entire expression.
\ 2 refers to the result of a regular match in the second grouping in the entire expression.
[root@zhaocheng ~] # cat shengri.txt small card 19901119 Xiao Hong 19920105 Xiao Li 19930211 Xiao she 19940325 Xiao Hei 19950418 matches [root@zhaocheng ~] # grep "\ b1993 [0-9]\ {4\}\ b" shengri.txt Xiaoli 19930211
It means\,\ b is a fixed suffix.
[root@zhaocheng ~] # grep "\" shengri.txt Xiaohong 19920105 [root@zhaocheng ~] # grep "\" shengri.txt Xiaohong 19920105
VI. Extending regular expressions
Whether in basic regular expressions or extended regular expressions, some common symbols have the same meaning.
. Represents any single character.
Indicates that the preceding character appears any number of times in a row, including 0.
. Represents any character of any length, with the same meaning as in wildcards.
\ represents an escape character and, when combined with a symbol in a regular expression, represents the symbol itself.
[] matches any single character within the specified range.
[^] matches any single character outside the specified range.
[[: alpha:]] represents any uppercase and lowercase letter.
[[: lower:]] represents any lowercase letter.
[[: upper:]] represents any uppercase letter.
[[: digit:]] represents any single number between 0 and 9 (inclusive).
[[: alnum:]] represents any number or letter.
[[: space:]] represents any white space character, including "space", "tab key", and so on.
[[: punct:]] indicates any punctuation mark.
[^ [: alpha:]] represents a single non-alphabetic character.
[^ [: lower:]] represents a single non-lowercase alphabetic character.
[^ [: upper:]] represents a single non-uppercase character.
[^ [: digit:]] represents a single non-numeric character.
[^ [: alnum:]] represents a single non-numeric, non-alphabetic character.
[^ [: space:]] represents a single non-white space character.
[^ [: punct:]] represents a single non-punctuation character.
[0-9] is equivalent to [[: digit:]].
[a murz] is equivalent to [[: lower:]]
[Amurz] is equivalent to [[: upper:]].
[a-zA-Z] is equivalent to [[: alpha:]].
[a-zA-Z0-9] is equivalent to [: alnum:].
[^ 0-9] is equivalent to [^ [: digit:]].
[^ aMuz] is equivalent to [^ [: lower:]].
[^ Amurz] is equivalent to [^ [: upper:]]
[^ a-zA-Z] is equivalent to [^ [: alpha:]]
[^ a-zA-Z0-9] is equivalent to [^ [: alnum:]]
^: indicates that the beginning of the line is anchored, and anything after this character must appear at the beginning of the line to match.
$: indicates that the end of the line is anchored, and anything before this character must appear at the end of the line to match.
^ $: indicates a matching blank line. The blank line described here means "enter", and "space" or "tab" cannot be counted as the blank line described here.
^ abc$: means that when abc has an exclusive row, it will be matched.
\ or\ b: matches the word boundary, indicating that the suffix is anchored, and the character before it must appear as the end of the word.
\ B: matches a non-word boundary, which is the opposite of\ b.
The grep command only supports basic regular expressions by default. If you want grep to support extended regular expressions, you need to use the-E option, but these 70% are common symbols. See the effect.
[root@zhaocheng ~] # grep "b [a-z]" test5babcbd [root@zhaocheng ~] # egrep "b [a-z]" test5babcbd [root@zhaocheng ~] # grep-E "b [a-z]" test5babcbd
The other 30% is slightly different from the basic regular expression, but looks easier to understand than the regular
In a regular expression, {n} means that the preceding character appears n times in a row and will be matched to the
In an extended regular expression, {n} means that the preceding characters appear n times in a row and will match to the
In basic regular expressions, () indicates grouping, and (ab) indicates that ab is treated as a whole.
In extended regular expressions, () indicates grouping, and (ab) indicates that ab is treated as a whole.
In extended regular expressions:
() indicates grouping
(ab) means to treat ab as a whole
\ 1 indicates the result of the regular match that refers to the first grouping in the entire expression
\ 2 represents the result of a regular match that references the second grouping in the entire expression
? Indicates that the character before it is matched 0 or 1 times
Indicates that the character in front of it is matched at least once, or several times in a row, with no cap on the number of consecutive times
{n} means that the preceding characters appear n times in a row and will match to
{xrecoery y} means that the previous characters appear at least x times in a row, and a maximum of y times in a row, which can be matched to
{, n} means that the previous characters appear at most n times in a row, or at least 0 times, will match to
{n,} means that the previous characters appear at least n times in a row before they match to
There is also a more commonly used symbol in extended expressions, which is not found in basic regular expressions. It is "|".
It means to indicate or
[root@zhaocheng ~] # cat test6kubernetes.comjenkins.comrabbitmq.comzookpeer.comspring boot.comdubbo.eduspring cloud.nethelm.org
Find out what lines end in .net and what xxx$ ends with.
[root@zhaocheng ~] # grep "net$" test6spring cloud.net [root@zhaocheng ~] # egrep "net$" test6spring cloud.net [root@zhaocheng ~] # grep-E "net$" test6spring cloud.net
For example, you can use "|" to find lines that end like com,net, and you can use egrep or grep-E, () to show that the contents in parentheses are regarded as a whole.
[root@zhaocheng ~] # egrep "(com | net) $" test6kubernetes.comjenkins.comrabbitmq.comzookpeer.comspring boot.comspring cloud.net [root@zhaocheng ~] # grep-E "(com | net) $" test6kubernetes.comjenkins.comrabbitmq.comzookpeer.comspring boot.comspring cloud.net
It can also be written that way, but not as (com | net) $exactly. What does $end with?
[root@zhaocheng ~] # grep-E "com | net" test6kubernetes.comjenkins.comrabbitmq.comzookpeer.comspring boot.comspring cloud.net
Summary of common extended expressions
Common symbols
. Represents any single character.
Indicates that the preceding character appears any number of times in a row, including 0.
. Represents any character of any length, with the same meaning as in wildcards.
\ represents an escape character and, when combined with a symbol in a regular expression, represents the symbol itself.
| means "or". |
[] matches any single character within the specified range.
[^] matches any single character outside the specified range.
Single character matching correlation
[[: alpha:]] represents any uppercase and lowercase letter.
[[: lower:]] represents any lowercase letter.
[[: upper:]] represents any uppercase letter.
[[: digit:]] represents any single number between 0 and 9 (inclusive).
[[: alnum:]] represents any number or letter.
[[: space:]] represents any white space character, including "space", "tab key", and so on.
[[: punct:]] indicates any punctuation mark.
[^ [: alpha:]] represents a single non-alphabetic character.
[^ [: lower:]] represents a single non-lowercase alphabetic character.
[^ [: upper:]] represents a single non-uppercase character.
[^ [: digit:]] represents a single non-numeric character.
[^ [: alnum:]] represents a single non-numeric, non-alphabetic character.
[^ [: space:]] represents a single non-white space character.
[^ [: punct:]] represents a single non-punctuation character.
[0-9] is equivalent to [[: digit:]].
[a murz] is equivalent to [[: lower:]]
[Amurz] is equivalent to [[: upper:]].
[a-zA-Z] is equivalent to [[: alpha:]].
[a-zA-Z0-9] is equivalent to [: alnum:].
[^ 0-9] is equivalent to [^ [: digit:]].
[^ aMuz] is equivalent to [^ [: lower:]].
[^ Amurz] is equivalent to [^ [: upper:]]
[^ a-zA-Z] is equivalent to [^ [: alpha:]]
[^ a-zA-Z0-9] is equivalent to [^ [: alnum:]]
Times matching correlation
? Indicates that the character before it is matched 0 or 1 times
Indicates that the character in front of it is matched at least once, or several times in a row, with no cap on the number of consecutive times.
{n} means that the preceding characters appear n times in a row and will be matched.
{xrecoery y} means that the previous characters can be matched at least x times in a row and at most y times in a row, in other words, as long as the previous characters appear continuously between x and y.
{, n} means that the previous characters will be matched at most n times or at least 0 times in a row.
{n,} means that the previous characters appear at least n times in a row before they are matched.
Location boundary matching correlation
^: indicates that the beginning of the line is anchored, and anything after this character must appear at the beginning of the line to match.
$: indicates that the end of the line is anchored, and anything before this character must appear at the end of the line to match.
^ $: indicates a matching blank line. The blank line described here means "enter", and "space" or "tab" cannot be counted as the blank line described here.
^ abc$: means that when abc has an exclusive row, it will be matched.
\ or\ b: matches the word boundary, indicating that the suffix is anchored, and the character before it must appear as the end of the word.
\ B: matches a non-word boundary, which is the opposite of\ b.
Grouping and backward reference
() represents a grouping, in which we can treat the contents as a whole, and the grouping can be nested.
(ab) means to treat ab as a whole.
\ 1 refers to the result of a regular match in the first grouping in the entire expression.
\ 2 refers to the result of a regular match in the second grouping in the entire expression.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.