Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Grep system and the Application of regular expressions

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Share

Shulou(Shulou.com)06/01 Report--

Last week, we talked about a very important content, and the teacher repeatedly told us to write a blog about it first, so let's talk about the grep system and the use of regular expression metacharacters in the text editing tool. First, what is the GREP system?

Grep system: grep, egrep, fgrep

The most important one is grep.

Grep:Global search Regular Expression and Print out the line

Use regular expressions to find globally and display matching lines

If you want to understand this sentence, I think it's best if we first look at what the metacharacters of regular expressions are, and we use PATTERN to represent them.

Character matching class:

.: means to match any single character

Example: .abcd refers to all the results that begin with any character and are followed by abcd, such as 1abcd, etc.

[]: matches any single character in the specified range

For example, if it is used with a character set, for example, [: lower:] matches any one of the lowercase letters.

[^]: matches any single character outside the specified range

There should be no example for this. It's the opposite of the previous one.

All of the following character sets can be placed in square brackets to match a single character

[: lower:]:

[: upper:]:

[: alpha:]:

[: digit:]:

[: space:]:

[: alnum:]:

[: punct:]:

[: blank:]:

[: xdigit:]: all hexadecimal numbers

Amurz: all lowercase characters

Amurz: all capital letters

0-9: identifies all decimal characters

Note: the word "amerz" here refers to all the lowercase letters, which is no longer what we mentioned earlier.

Number of times matching: the number of times that a character before this class can appear

*: the character in front of it can appear any number of times (0, once, multiple times)

For example, a*bc means that a can appear any number of times before bc, it can be 0 times, it can be 1 time, or it can be multiple times

\?: the character in front of it is optional (0 or 1)

Example: a\? bc means that there can only be 0 or 1 an in front of bc. The escape character\ is added before to prevent shell from putting? Treat it as a bash variable, and the same is true of the following\

\ +: the character preceding it appears at least once (1 or more times)

For example, a\ + bc means that an appears before bc at least once, or multiple times.

\ {m\}: the character before it must appear m times

Example: a\ {m\} bc indicates that there must be m times a before bc

\ {mdirection n\}: the character preceding it appears at least m times and n times at most

Example: a\ {m\ n\} bc means that there are at least m times an in front of bc, and n times at most

\ {, n\}: the character preceding it appears at most n times

Example: a\ {, n\} bc means that a can appear at most n times in front of bc.

\ {m,\}: the character preceding it appears at least m times, × × ×

Example: a\ {m,\} bc means that there are at least m times an in front of bc.

In a regular expression, the way to represent any character of any length:. *

For example:. * abc means that as long as it ends with abc, it can be preceded by anything of any length and character.

Position Anchor character:

Row anchoring:

Anchor at the beginning of the line: ^

End of Line Anchor: $

Word anchoring:

Prefix anchor:\ or\ b

\ b: the anchoring method in the old version is not recommended

We will explain in more detail about line prefix anchoring and word anchoring in later examples.

In a regular expression, a word is a continuous string of non-special characters.

Grouping and reference characters:

\ (PATTERN\): treat all characters matched by PATTERN as an indivisible whole

Grouping is easy to understand, which is to look for a match in a computer as a whole consisting of a string of characters.

In the regular expression engine, there are a series of built-in variables that hold all the character information in the grouping for backward referencing, which in turn are\ 1,\ 2,\ 3.

PATTERN1\ (PATTERN2\) PATTERN3\ (PATTERN4\ (PATERN5\))

\ 1:PATTERN2

\ 2:PATTERN4

\ 3:PATTERN5

\ 1: characters matched by PATTERN in the first set of parentheses

\ 2: characters matched by PATTERN in the second set of parentheses

\ 3: characters matched by PATTERN in the third set of parentheses

With regard to references, the simplest explanation is that if the string you have used before is used in the later expression, you can replace it with only one symbol instead of typing it, and the first one is represented by\ 1. The second character is represented by\ 2, and so on.

Example: please find out the user account with the same UID and GID in / etc/passwd

First of all, let's analyze that the user's UID and GID are a string of numbers, so we think that we can match them with a numeric character set, and then analyze that the user's UID and GID are generally composed of several numbers, so we can use\ + to achieve it, and then require GID to be the same as UID. We only need to show GID, and drink directly to represent UID, so the specific command format should be the following format.

Grep'(\). *\ 1' / etc/passwd

Or:

\ |

Note: or characters treat strings on both sides of them as a whole

A\ | american:An or american

How to treat the strings on both sides as a whole, such as the above example A\ | american stands for An or american instead of An or a, and then connect it with merican

Let's look at another example:

Please find the integer with a value between 100,255in the result of ifconfig execution.

No. 1: 1 / 2

Second place: 0-9 0-4 5

Third place: 0-9 0-9 0-5

The difficulty of this problem does not lie in the representation of PATTENR, but in the analysis of the characteristics of the integer. As long as the analysis is clear, the problem can be easily solved.

Ifconfig | grep'\'

After talking about the metacharacters of regular expressions, let's move on to the specific usage of the grep system.

Grep [OPTIONS] PATTERN [FILE...]

PATTERN: filter condition, which consists of regular expression metacharacters and text characters with no special meaning

Metacharacters of regular expressions:

Will be interpreted as a special meaning by the regular expression engine

Regular expression engine of pcre--perl language

Basic regular expression: BRE

Extended regular expression: ERE

Grep only supports basic regular expressions by default

Egrep supports extended regular expressions by default

Fgrep does not turn on the regular expression engine by default

Text characters:

Those characters that have only the superficial meaning of the characters

Common options:

-I-- ignore-case ignores the case of text characters

-v-- invert-match reverse match, and the final result is the row that PATTERN cannot match successfully

-c-- count count, count all rows that match PATTERN

-o,-- only-matching turns off greedy mode and displays only the content that PATTERN matches

-Q,-- quiet,-- silent quiet mode, does not output any matching results, for logical judgment

-- color [= WHEN],-- colour [= WHEN]: highlight characters that match PATTERN in a special color

-- color+auto

-E,-- extended-regexp extended regular expression, grep-E is equivalent to egrep

-F,-- fixed-strings,-- fixed-regexp basic regular expression, egrep-F is equivalent to fgrep

-G,-- basic-regexp: the basic regular expression egrep-G is equivalent to grep

-P,-- perl-regexp: use the PCRE engine

-A NUM,-- after-context=NUM: displays the NUM line following the line that matches the PATTERN

-B NUM,-- before-context=NUM: displays the NUM line in front of it while displaying rows that match PATTERN

-C NUM,-NUM,-- context=NUM: displays the NUM line before and after the line that matches the PATTERN

In fact, the usage of grep is similar to the learning of commands, that is, just remember it well and then cooperate with the operation to consolidate it. Then it is easy to understand egrep and fgrep. The difference between egrep and grep is that you don't have to escape characters when using regular expression metacharacters.

Egrep:

Egrep [OPTIONS] PATTERN [FILE...]

Extended regular expression metacharacters:

Character matching:

.

[]

[^]

Number of times match:

*

?

+

{}

{m,n}

{m,}

{, n}

Position anchoring character

^

$

\,\ b

Grouping and referencing

()

\ 1,\ 2,\ 3

Or:

| |

All characters in fgrep:PATTERN are treated as text characters

With regard to fgrep, we just interpret PATTENR as plain text, and no longer use regular expressions to interpret it, which can be used to improve the speed of search.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Network Security

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report