Regular expression and programming three Musketeers (grep, sed, awk) commands detailed explanation 04/16 Update SLTechnology News&Howtos

Regular expression and programming three Musketeers (grep, sed, awk) commands detailed explanation

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Blog outline:

Regular expressions

(1) definition of regular expression

(2) the use of regular expressions

1. Basic regular expression

(1) grep command tool

two。 Extended regular expression

II. Text editing processor

1.grep Command tool

2.sed Command tool

3.awk Command tool

Regular expression (1) definition of regular expression

Regular expressions are also known as regular expressions and regular expressions. It is often abbreviated to regex, regexp, or RE in code. A regular expression is described by a single string that matches a series of strings that conform to a syntactic rule. To put it simply, a regular expression is a way to match a string, through some special symbols, to quickly find, delete, and replace a particular string.

A regular expression is a text pattern consisting of ordinary characters and metacharacters. This pattern is used to describe one or more strings to match when searching for text. The regular expression acts as a template that matches a character pattern with the searched string. Among them, ordinary characters include uppercase and lowercase letters, numbers, punctuation and other symbols, while metacharacters refer to special characters that have special meaning in regular expressions. It can be used to define the occurrence pattern of its leading character (that is, the character before the metacharacter) in the target object.

Regular expressions are commonly used in scripting and text editors. Many text processors and programming languages support regular expressions. For example, the text processors (grep, egrep, sed, awk) often used in LInux systems, regular expressions have a very powerful text matching function, which can quickly and efficiently process text in the text ocean.

(2) the use of regular expressions

Regular expressions are very important for system administrators, and a large amount of information will be generated during the operation of the system, some of which are very important and some are just warning messages. As a system administrator, if you directly view so much information data, you can not quickly locate very important information. Such as "user account login failed", "service startup failure" and other important information. This makes it possible to quickly extract problematic information through regular expressions, which makes the operation and maintenance work more simple and convenient.

The influence of system language family on regular expression is very great!

The output results of zh_TW.big5 and C are as follows:

At LANG=C: 0 1 2 3 4. A B C D... Z a b c d... z

At LANG=zh_TW: 0 1 2 3 4. An A b B c C d D... Z Z

In order to avoid the interception of English and numbers caused by this coding, there are some special symbols we need to know! As shown in the figure:

At present, many software also support regular expressions. In Internet, spam and email will cause network congestion. If these problems are eliminated in advance on the server side, the client will reduce a lot of unnecessary bandwidth consumption.

As a Linux system administrator, mastering regular expressions is one of the necessary conditions.

1. Basic regular expression

The string expression of regular expression can be divided into basic regular expression and extended regular expression according to different degree of rigor. The underlying regular expression is the most basic part of a commonly used regular expression. In the common file processing tools in Linux systems, grep and sed support regular expressions, while egrep and awk support extended regular expressions. To master the use of basic regular expressions, you must first understand the meaning of metacharacters contained in basic regular expressions.

(1) grep command tool 1) basic regular expression example: [root@localhost ~] # grep-n 'the' test.txt// find the line containing the [root@localhost ~] # grep-vn' the' test.txt// find the line that does not contain the [root@localhost ~] # grep-in 'the' test.txt// find the line containing the And case-insensitive [root@localhost ~] # grep-n'shio] rt' test.txt / / lookups start with sh and end with rt The characters with I or o in the middle [root@localhost ~] # grep-n'[^ w] oo' test.txt / / query strings that are not preceded by w [root@localhost ~] # grep-n'[^ a murz] oo' test.txt / / query strings that are not preceded by lowercase letters oo [root@localhost ~] # grep-n'^ the' test.txt / / query strings that begin with the (^) [root@localhost ~ ] # grep-n'^ [^ a-zA-Z] 'test.txt / / query strings that do not begin with a letter ([^] means reverse) [root@localhost ~] # grep-n'\. $'a.txt / / query with "." The string / / $at the end of the line means the end of the line because "." Is a special metacharacter, so you need to use the "\" jump character to convert it to the normal character [root@localhost ~] # grep-n 'w.d' test.txt / / query the two-character line between w and d ("." Match any character) [root@localhost ~] # grep-n 'ooo*' test.txt / / find a string containing at least two o, "*" means to repeat zero or more previous single characters [root@localhost ~] # grep-n' woo*d' test.txt / / query starts with w Lines that end with d and contain at least one o in the middle [root@localhost ~] # grep-n 'w.progresd' test.txt query ends with w and the characters in the middle are optional lines ("." Represents any) [root@localhost ~] # grep-n'o\ {2\} 'test.txt / / {n} matches n times. The query contains two o rows ("{}" is a special character that needs to be escaped with "\") [root@localhost ~] # grep-n'wo\ {2my5\} 'test.txt / / query ends with a w beginning with d, and a line containing 2x 5 o ({n grep m} matches at least n times and m times at most) [root@localhost ~] # grep-n' wo\ {2,\} 'test.txt query begins with w and ends with d Lines with more than 2 o in the middle ({n,} match at least n times) 2) Summary of common metacharacters in basic regular expressions, as shown in the figure:

two。 Extended regular expression

It is usually sufficient to use basic regular expressions, but to simplify the entire instruction, you need to use a wider range of extended regular expressions.

Egrep and awk support extended regular expressions in the common text processing tools in Linux systems, and the usage of the egrep command is basically similar to that of the grep command.

1) Summary of common metacharacters in extended regular expressions

II. Text editing processor 1.grep command tool

It has been mentioned in the basic regular expression, so I won't go into detail here!

2.sed Command tool

Sed is a powerful and simple text parsing conversion tool, which can read the text and edit the text content according to the specified conditions, and finally output all the line work only some of the lines processed. Sed can achieve quite complex text processing operations without interaction. It is widely used in shell scripts to complete a variety of automated processing tasks.

The workflow of sed mainly includes:

Read: sed reads a line from the input stream can not be stored in a temporary buffer; execution: by default all sed commands are executed sequentially in the mode space, unless you specify the address of the line, the sed command will be executed on all lines in turn; display: send the modified content to the output stream, and then send data, the mode space will be empty.

Note: before all the contents of the file are processed, the above process will be repeated until all the contents have been processed. 1) the syntax and related parameters of the sed command:

Common sed command options common parameters, as shown in the figure:

If changes are required between lines, etc., common operation parameters include:

2) examples of sed command usage

Note that the following actions will not change the contents of the file itself. If you need to modify it, you must have the "- I" option.

(1) use the sed command to filter all the contents that meet the criteria [root@localhost ~] # sed-n 'p' test.txt / /, which is equivalent to "cat test.txt" [root@localhost ~] # sed-n '3p' test.txt / / output the third line [root@localhost ~] # sed-n' 3jue 5p' test.txt / / output 3 lines [root@localhost ~] # sed-n'p N 'test.txt// outputs all odd lines, n means reading the next line of data [root@localhost ~] # sed-n'nten p' test.txt// outputs all even rows, n means reading into the next row of data [root@localhost ~] # sed-n' 1 test.txt// output odd rows (lines 1, 3, 5) [root@localhost ~] # sed-n' 10 line ${n P} 'test.txt// outputs even lines (including blank lines) from line 10 to the end of the file

A case study of the use of sed commands in conjunction with regular expressions

The format of the sed command is slightly different when combined with regular expressions, which are surrounded by "/".

[root@localhost ~] # sed-n'/ the/p' test.txt// output line containing "the" [root@localhost ~] # sed-n'4 / the/p' test.txt// output from line 4 to the first line containing "the" [root@localhost ~] # sed-n'/ the/=' test.txt// output the line number of the line containing "the" (equal sign (=) to output the line number) [root@localhost ~] # sed-n'/ ^ PI/p' test.txt// outputs the line [root@localhost ~] # sed-n'/\ that begins with "PI" / p 'test.txt / / output the line containing the word wood \ represents the word boundary (2) Delete eligible text

The nl command is used to calculate the number of lines in a file

[root@localhost ~] # nl test.txt | sed '3d'// deletes line 3 [root@localhost ~] # nl test.txt | sed' 3 Magazine 5d nl test.txt / deletes lines 3-5 [root@localhost ~] # nl test.txt | sed'/ cross/d'// deletes the line containing cross, and the original line 8 is deleted [root@localhost ~] # nl test.txt | sed'/ crossbank! Delete lines that do not contain cross [root@localhost ~] # sed'/\. $/ d' test.txt / / delete to "." The ending line [root@localhost ~] # sed'/ ^ $/ d 'test.txt// deletes all blank lines [root@localhost ~] # sed-e' / ^ $/ {nist / ^ $/ d} 'test.txt//, leaving one consecutive blank line (3) to replace the eligible text

The options you need to use when using the sed command to replace: s (string substitution), c (whole line / block substitution), y (character conversion), and other command options.

[root@localhost ~] # sed's sed the test.txt// 'test.txt// replaces the first the in each line with THE [root@localhost ~] # sed's test.txt//, replaces the third "l" in each line with "L" [root@localhost ~] # sed's test.txt// replace all the "the" in the file with "THE" [root@localhost ~] # sed' Delete all "o" in the file [root@localhost ~] # sed's / ^ / # / 'test.txt / / insert the "#" sign [root@localhost ~] # sed' / the/s/ ^ / # / 'test.txt / / at the beginning of each line [root@localhost ~] # sed's test.txt / / at the beginning of each line containing "the" Insert the string "EOF" [root@localhost ~] # sed'3 at the end of each line / replace all "the" in lines 3-5 with "THE" [root@localhost ~] # sed'/ the/s/o/O/g' test.txt / / replace o with "O" in all lines containing "the"

The above "sed-I" command is to directly modify the contents of the file, effective immediately!

[root@localhost ~] # sed-I'1c 1111' a.txt// the first line of the replacement text reads "1111" [root@localhost ~] # sed-I'1a 1111' a.txt// inserts a line after the first line, and the content reads "1111" [root@localhost ~] # sed-I'1i 2222' a.txt// before the first line The content is "2222" [root@localhost ~] # sed-I '1d' a.txt// Delete the first line of content [root@localhost ~] # sed-n' 1p' a.txt// prints out the first line [root@localhost ~] # sed-I'1s 2222 sed-I'1s a.txt// replace the first line of text "2222" with "3333" (4) migrate eligible text

The options you need to use to migrate text using the sed command are:

G, G overwrite / append the data from the clipboard to the specified line; w save as a file; r read the specified file; an append the specified content. [root@localhost ~] # sed'/ the/ {Hbomd}; $G'test.txt / / migrate the line containing "the" to the end of the file, " "for multiple operations [root@localhost ~] # sed'1, 5 {Hutterd} 17G 'test.txt / / transfer the contents of lines 1-5 to line 17 [root@localhost ~] # sed' / the/w out.file' test.txt / / Save the line containing "the" as a file out.file [root@localhost ~] # sed'/ the/r / etc/hostname' test.txt / / add the contents of the file / etc/hostname to each line containing "the" after [root@localhost ~] # sed '3aNEW'test.txt / / insert a new line after line 3 The content is "NEW" [root@localhost ~] # sed'/ the/aNEW' test.txt// inserts a new line after each line containing "the", the content reads "NEW" [root@localhost ~] # sed '3aNEW1\ nNEW2' test.txt// after line 3, and the middle "\ n" indicates a new line (5) edits the file using a script

Using the sed script, the editing instructions are stored in a file (one tag instruction per line) and invoked with the "- f" option.

[root@localhost ~] # sed'1 test.txt// 5 {Hutchd}; 17G 'test.txt// transfers lines 1-5 to line 17

The above operations are converted to script files:

[root@localhost ~] # vim 1.list1 sed 5H1author5d17G [root@localhost ~] # sed-f 1.list test.txt (6) sed Direct manipulation File example

Write a script to adjust the vsftpd service configuration: disable anonymous users, but allow local users (and writes) to log in.

[root@localhost ~] # vim locallocallocallocalroomonlylyarmed ftp.shroupBash Bash = "/ usr/share/doc/vsftpd-3.0.2/EXAMPLE/INSERNET_SITE/vsftpd.conf" C = "/ etc/vsftpd/vsftpd.conf" # specify the sample file path, the configuration file path [!-e "$C.bak"] & & cp $C $C.bak# back up the original configuration file and check whether (configuration file .bak) exists If it does not exist, use the cp command to copy sed-e'/ ^ anonymous_enable/s/YES/NO/g' $S > $Csed-I-e'/ ^ local_enable/s/NO/YES/g'-e'/ ^ write_enable/s/NO/YES/g' $Cgrep "listen" $C | | sed-I'$alisten=YES' $A# adjust based on the sample configuration, overwrite the existing file systemctl restart vsftpdsystemctl enable vsftpd# and restart the ftp service, and set it to boot self-starting 3.awk command tool

In Linux/UNIX system, awk is a powerful editing tool, which reads input text line by line, searches according to the specified matching pattern, formats and outputs or filters the content that meets the requirements, and can achieve quite complex text operations without interaction. It is widely used in Shell scripts to complete a variety of automatic configuration tasks.

1) Overview of awk command

The result of awk execution can be printed and displayed through the function of print. In the process of using the awk command, you can use the logical operators & & and | |

Simple mathematical operations can also be carried out, such as +, -, *, /,%, ^ to represent addition, subtraction, multiplication, division, remainder, and multiplication, respectively.

Awk reads information from an input file or standard input, and like sed, information is read line by line. The difference is that the awk command treats a line in a text file as a record and a part (column) of a line as a field of a record. To manipulate these different fields (columns), awk borrows a method similar to a location variable in shell, using $1, $2... The order of $9 represents different columns, and $0 represents the whole row. Different fields and different fields can be separated in a specified way, and the default delimiter for awk is a space. The awk command allows you to specify the delimiter in the form of "- F delimiter".

The awk command processes the / etc/passwd file, as shown in the figure:

Awk contains several special built-in variables, such as:

2) awk command usage example (1) output text [root@localhost ~] # awk'{print} 'test.txt// by line, which is equivalent to "cat test.txt" [root@localhost ~] # awk' {print $0} 'test.txt// output Equivalent to "cat test.txt" [root@localhost ~] # awk 'NR==1,NR==3 {print}' test.txt / / output lines 1-3 [root@localhost ~] # awk'(NR > = 1) & & (NR

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.