Common commands of text processing tool awk 07/04 Update SLTechnology News&Howtos

Common commands of text processing tool awk

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Awk is actually a programming language, it supports conditional judgment, array, loop and other functions, we can understand it as a scripting language interpreter

Together with grep and sed, it is called the "three Musketeers" by linux.

Each has its own specialty.

Grep is more suitable for simply finding or matching text.

Sed is more suitable for editing matched text.

Awk is more suitable for formatting text and more complex formatting of text.

I. Foundation of AWK

The basic usage of awk

Format

Awk Action File name / File name / awk/ Action

If you don't specify any parameters, use awk directly.

The format is like this.

Awk'{print} 'file

Output the whole file directly, which is equivalent to the cat command in shell

Example

[root@zhaocheng ~] # awk'{print} 'echo.sh #! / bin/bashecho "shucai"\ b "niunai"

Take the third column of the free-m file. In this way, we can directly use free-m to output it first, and then use the pipeline to get its third column. For example, if we take the third column, there is actually a delimiter in the middle, that is, a space is not specified, and the space is used as the delimiter by default.

[root@zhaocheng ~] # free-m | awk'{print $3} 'free1150

You can also take multiple columns to separate, for example, or take free-m, take its second, third, fourth column.

[root@zhaocheng ~] # free-m | awk'{print $2 used free shared1838 4} 'used free shared1838 116 0 0 0

In addition, awk also supports the processing of text, adding strings, such as this / etc/passwd file, it looks like there are columns, but awk does not add-F by default, it is extracted by space, and this does not add a ":" as the delimiter of the text.

Format

'{print $1, "hello"}'

'{print $1, "hello"} "

'{print $1, "net", $2, "hello"}'

[root@zhaocheng ~] # awk-F ":"'{print $1 test root x hellobin x hellodaemon x helloadm x hellolp x hellosync x helloshutdown x hellohalt x hello 2, "" hello "} 'test root x hellobin x hellodaemon x helloadm x hellolp x hellosync x helloshutdown x hellohalt x hello [root@zhaocheng ~] # awk-F": "' {print $1," net ", $2," hello "} 'test root net x hellobin net x hellodaemon net x helloadm net x hellolp net x hellosync net x helloshutdown net x hellohalt net x hello

And its format is'{print $1}'. You can't put quotation marks in the middle, otherwise it will be output as ordinary characters.

[root@zhaocheng ~] # awk-F ":"'{print "$1} 'test $1 $1 $1 $1 $1 $1 $1 $1 $1 $1 $1 $1 $1 [root@zhaocheng ~] # awk-F": "' {print'$1'} 'test root:x:0:0:root:/root:/bin/bashbin:x:1:1:bin:/bin:/sbin/nologindaemon:x:2:2:daemon:/sbin:/sbin/nologinadm:x:3:4:adm:/var/adm:/sbin/nologinlp:x:4:7 : lp:/var/spool/lpd:/sbin/nologinsync:x:5:0:sync:/sbin:/bin/syncshutdown:x:6:0:shutdown:/sbin:/sbin/shutdownhalt:x:7:0:halt:/sbin:/sbin/halt

For example, print a file, which is equivalent to the cat command. Here awk gives two uses. The first is that you can use awk'{print $0}'or awk'{print}', both of which can be printed.

[root@zhaocheng ~] # echo "Learn awk to improve your Linux skills" | awk'{print} 'Learn awk to improve your Linux skills [root@zhaocheng ~] # echo "Learn awk to improve your Linux skills" | awk' {print $0} 'Learn awk to improve your Linux skills

While $0 shows the entire row, $NF represents the last column separated by the current row ($0 and $NF are built-in variables)

The penultimate column of each row can be written as $(NF-1)

[root@zhaocheng ~] # awk-F:'{print $(NF-1)} 'test / root/bin/sbin/var/adm/var/spool/lpd/sbin/sbin/sbin [root@zhaocheng ~] # awk-F:' {print $NF} 'test / bin/bash/sbin/nologin/sbin/nologin/sbin/nologin/sbin/nologin/bin/sync/sbin/shutdown/sbin/halt

Awk contains two special modes: BEGIN and END

BEGIN mode specifies the actions that need to be performed before processing the text

The END mode specifies what needs to be done after all rows have been processed

As you can see in this BEGIN, even though we specify the output source later, it only prints the previous characters, which is the mode of BEGIN.

[root@zhaocheng] # awk 'BEGIN {print "one", "two"}' one [root@zhaocheng] # cat filetest root:$1 $dDTFylQ3 $.vTZKpm7mrra9WMsxvBfW.VTZKpm7mrra9WMsxvBfW.18241WMSxvBfW.VTZKpm7mrra9WMsxvBfW.18834UDMABZKpm7Mrra9WMsxvBfW.VTZKpm7mrra9WMsxvBfW.VTZKpm7mrra9WMsxvBfW.18834UBZKpm7mrra9WMsxvBfW.VTZKpm7mrra9WMSxvBfW.VTZKpm7mrra9WMsxvBfW.vTZKpm7mrra9WMSxvBfW.vTZKpm7mrra9WMsxvBfW.vTZKp

Second, the separator of AWK

The use of awk delimiters, here you can use a variety of methods

What is the delimiter of-F to get the second column and the third column, and all three methods can be used to get this value.

[root@zhaocheng] # awk-F:'{print $2 awk-F': 'test x 0x 1x 2x 3x 4x 5x 7 [root@zhaocheng ~] # awk-F ":"' {print $2J * * 3} 'test x 0x 1x 2x 3x 4x 5x 6x 7 [root@zhaocheng ~] # awk-F':'{print $2Gen 3} 'test x 0x 2x 3x 4x 5x 6x 7

In addition to the-F option, you can also specify the input delimiter of awk by setting internal variables. The awk built-in variable FS can be used to specify the input delimiter, but when using variables, you need to use the-v option. You can use "", "'" here.

[root@zhaocheng ~] # awk-v FS=':''{print $2 FS= 3} 'test x 0x 1x 2x 3x 4x 5x 7 [root@zhaocheng ~] # awk-v FS= ":"' {print $2cmlt "test x 0x 1x 2x 4x 5x 6x 7

The syntax of awk is as follows

* * awk [option] 'Mode {Action}' file

And-F is a kind of option, which is generally used to specify the delimiter-v is also a kind of option, which is used to set the value of a variable.

The delimiter of AWK is divided into input delimiter and output delimiter.

Enter the delimiter: that is, when you enter the command with:, # as the delimiter, this is the delimiter we entered

Output delimiter: that is, when we use # as the delimiter, the output will be separated by a space, which is our output delimiter, which is actually not output.

Of course, we can also output our output delimiter with the built-in variable OFS of awk. When using variables, you need to add the-v option. OFS is equivalent to using a space as a delimiter * *

[root@zhaocheng ~] # cat test1aaa bbb ooocccc dddd eeeefffff ggggg hhhhhkkkkk pppppp ssssss [root@zhaocheng ~] # awk-v OFS= "*" {print $2 recording 3} 'test1bbb*ooodddd*eeeeggggg*hhhhhpppppp*ssssss

If you want to use the middle symbol as the delimiter, you can use FS= ":" when the delimiter, and specify both the input delimiter and the output delimiter

[root@zhaocheng ~] # cat testroot:x:0:0:root:/root:/bin/bashbin:x:1:1:bin:/bin:/sbin/nologindaemon:x:2:2:daemon:/sbin:/sbin/nologinadm:x:3:4:adm:/var/adm:/sbin/nologinlp:x:4:7:lp:/var/spool/lpd:/sbin/nologinsync:x:5:0:sync:/sbin:/bin/syncshutdown:x:6:0:shutdown: / sbin:/sbin/shutdownhalt:x:7:0:halt:/sbin:/sbin/halt [root@zhaocheng ~] # awk-v FS= ":"-v OFS= "*" {print $2 $3} 'testx*0x*1x*2x*3x*4x*5x*6x*7

What I just used is the output delimiter. I want the middle space to be separated by the specified output. if you do not use the middle symbol to separate and merge directly, you only need to separate $1 $2 without a sign, one is to display the connection together, and the other is to display it as a separator.

[root@zhaocheng ~] # awk'{print $1 $2} 'test1aaabbbccccddddfffffgggggkkkkkpppppp [root@zhaocheng ~] # awk' {print $1 @ 2} 'test1aaabbbccccddddfffffgggggkkkkkpppppp

Third, the variable of AWK

For awk, "variables" are divided into 'built-in variables' and 'custom variables "." input delimiter FS, and output delimiter "OFS are built-in variables.

Common built-in variables and functions of awk

FS: enter the field delimiter, which defaults to a blank character

OFS: output field delimiter, default to white space character

RS: input record separation (input newline character) specifies the newline character when entering

ORS: enter the record delimiter (output newline character), and output it with the specified symbol instead of the newline character

NF: the number of fields in the current row (that is, the current row is separated into several columns), the number of fields

NR: line number, the line number of the line of text currently being processed

FNR: the line number counted separately by each file

FILENAME: current file name

ARGC: the number of arguments on the command line

ARGV: an array that holds the parameters given by the command line

Both FS and OFS have been used just now. FS is the input delimiter and OFS is the output delimiter. The default is a blank character. For example, what is the delimiter for FS-v FS= ":" the output is usually separated by spaces without adding the backdoor OFS display. If you add it in the output format, you need to specify-v OFS= "" to replace the space with the space.

Built-in variables NR, NF

When I mentioned NR just now, it is actually the line number of the current text. There are several lines in it.

NF is the number of columns in a line, usually separated by spaces.

There are four columns here, and the line number is directly displayed using NR.

[root@zhaocheng ~] # cat test1aaa bbb ooocccc dddd eeeefffff ggggg hhhhhkkkkk pppppp ssssss [root@zhaocheng ~] # awk'{print NR} 'test11234

There are three columns in each row, so you can count how many columns there are using NF.

[root@zhaocheng ~] # awk'{print NR,NF} 'test11 32 33 34 3

Count the number of lines in the test file and sort them using line numbers

You can use awk's {print $0 NR}'

[root@zhaocheng ~] # awk'{print NR $0} 'test 1 root:x:0:0:root:/root:/bin/bash2 bin:x:1:1:bin:/bin:/sbin/nologin3 daemon:x:2:2:daemon:/sbin:/sbin/nologin4 adm:x:3:4:adm:/var/adm:/sbin/nologin5 lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin6 sync:x:5:0:sync:/sbin:/bin/sync7 shutdown:x:6:0 : shutdown:/sbin:/sbin/shutdown8 halt:x:7:0:halt:/sbin:/sbin/halt

Or use cat-n to count the line number.

[root@zhaocheng ~] # awk'{print} 'test | cat-n 1 root:x:0:0:root:/root:/bin/bash 2 bin:x:1:1:bin:/bin:/sbin/nologin 3 daemon:x:2:2:daemon:/sbin:/sbin/nologin 4 adm:x:3:4:adm:/var/adm:/sbin/nologin 5 lp:x:4:7:lp:/var/spool/lpd:/ Sbin/nologin 6 sync:x:5:0:sync:/sbin:/bin/sync 7 shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown 8 halt:x:7:0:halt:/sbin:/sbin/halt

Variables are generally used in bash syntax, such as $1 for the first line and $2 for the second line, so there is no need to add $in awk whether you use built-in variables or custom variables

Built-in variable FNR

This is generally a built-in variable that records line numbers while dealing with multiple files. If multiple files are matched, using NR to record line numbers will only be arranged directly in order, while using FNR will be arranged separately.

[root@zhaocheng ~] # awk'{print NR $0} 'test test11 root:x:0:0:root:/root:/bin/bash2 bin:x:1:1:bin:/bin:/sbin/nologin3 daemon:x:2:2:daemon:/sbin:/sbin/nologin4 adm:x:3:4:adm:/var/adm:/sbin/nologin5 lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin6 sync:x:5:0:sync:/sbin:/bin/sync7 shutdown:x:6:0 : shutdown:/sbin:/sbin/shutdown8 halt:x:7:0:halt:/sbin:/sbin/halt9 aaa bbb ooo10 cccc dddd eeee11 fffff ggggg hhhhh12 kkkkk pppppp ssssss [root@zhaocheng ~] # awk'{print FNR $0} 'test test11 root:x:0:0:root:/root:/bin/bash2 bin:x:1:1:bin:/bin:/sbin/nologin3 daemon:x:2:2:daemon:/sbin:/sbin/nologin4 adm:x:3:4:adm:/var/adm:/sbin/nologin5 lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin6 sync:x:5:0:sync:/sbin:/bin/sync7 shutdown:x:6:0 : shutdown:/sbin:/sbin/shutdown8 halt:x:7:0:halt:/sbin:/sbin/halt1 aaa bbb ooo2 cccc dddd eeee3 fffff ggggg hhhhh4 kkkkk pppppp ssssss

Built-in variable RS

RS is the input line delimiter. If it is not specified, the default line delimiter is the carriage return newline as we understand it.

[root@zhaocheng ~] # awk-v RS= "'{print NR,$0} 'test11 aaa2 bbb3 ooocccc4 dddd5 eeeefffff6 ggggg7 hhhhhkkkkk8 pppppp9 ssssss

Built-in variable ORS

This is similar to the newline character of ORS, except for a new line when the output line separator + + is specified.

[root@zhaocheng ~] # awk-v ORS= "+"'{print NR,$0} 'test11 aaa bbb ooo++++2 cccc dddd eeee++++3 fffff ggggg hhhhh++++4 kkkkk pppppp ssssss++++ [root@zhaocheng ~] #

Built-in variable FILENAME

FILENAME is a built-in variable that literally shows the file name, uses the FNR of the specified multi-file, counts the line number, and prints out the name of the file.

[root@zhaocheng ~] # awk'{print FILENAME,FNR $0} 'test1 testtest1 1 aaa bbb oootest1 2 cccc dddd eeeetest1 3 fffff ggggg hhhhhtest1 4 kkkkk pppppp sssssstest 1 root:x:0:0:root:/root:/bin/bashtest 2 bin:x:1:1:bin:/bin:/sbin/nologintest 3 daemon:x:2:2:daemon:/sbin:/sbin/nologintest 4 adm:x:3:4:adm:/var/adm:/sbin/nologintest 5 lp:x:4:7:lp:/var/spool/lpd:/sbin/nologintest 6 sync: X:5:0:sync:/sbin:/bin/synctest 7 shutdown:x:6:0:shutdown:/sbin:/sbin/shutdowntest 8 halt:x:7:0:halt:/sbin:/sbin/halt

Built-in variables ARGV and ARGC

The ARGV built-in variable represents an array that holds the parameters given by the command line

[root@zhaocheng ~] # awk 'BEGIN {print "aaa"}' test test1aaa [root@zhaocheng ~] # awk 'BEGIN {print "aaa", ARGV [1]}' test test1aaa test [root@zhaocheng] # awk 'BEGIN {print "aaa", ARGV [2]}' test test1aaa test1 [root@zhaocheng ~] # awk 'BEGIN {print "aaa", ARGV [1], ARGV [2]}' test test1aaa test test1

From the above, we can see that the ARGV built-in variable represents an array, the first array is test, the second is the file test1, and ARGV [0] is our own built-in variable awk

[root@zhaocheng ~] # awk 'BEGIN {print "aaa", ARGV [0], ARGV [1], ARGV [2]}' test test1aaa awk test test1

And this ARGC is to count the total number of variables.

[root@zhaocheng ~] # awk 'BEGIN {print "aaa", ARGV [0], ARGV [1], ARGV [2], ARGC}' test test1aaa awk test test1 3

Custom variable

To customize a variable is to define a variable for the user

[root@zhaocheng ~] # awk-v www= "1234" 'BEGIN {print www}' 1234 [root@zhaocheng ~] # awk 'BEGIN {www= "1234"; print www}' 1234 define multiple variables [root@zhaocheng ~] # awk 'BEGIN {www= "1234"; baidu= "4567"; print www, baidu}' 1234 4567

IV. AWK format

Comparing the difference between print and printf, we can see that print can take the content of $1, while printf will not output the newline character, and the default will output the text on one line.

[root@zhaocheng ~] # awk'{print $1} 'test1aaaccccfffffkkkkk [root@zhaocheng ~] # awk' {printf $1} 'test1aaaccccfffffkkkkk [root@zhaocheng ~] #

You can also print out the format of printf if you use the format substitution of printf.

[root@zhaocheng ~] # awk'{printf "% s\ n", $1} 'test1aaaccccfffffkkkkk

When using printf, you need to separate the format from the column with a comma, and when printing a string, you can replace it with\ n, or% s\ n.

[root@zhaocheng ~] # printf "helloya\ n" helloya [root@zhaocheng ~] # printf "% s\ n" helloyahelloya

Be careful when using the printf action in awk

1) the text output using the printf action will not break lines. If you need to break lines, you can add\ nafter the corresponding format substitution character to escape.

2) when using the printf action, the specified format and the formatted text need to be separated by a comma

3) when using the printf action, the format substitution character in the format must correspond to the formatted text one by one

You can use format replacers to format each column in the text

Use awk and printf to print and format spaces. The default is no input delimiter.

[root@zhaocheng ~] # cat test1aaa bbb ooocccc dddd eeeefffff ggggg hhhhhkkkkk pppppp ssssss [root@zhaocheng ~] # awk'{printf "% s\ n", $1} 'test1aaaccccfffffkkkkk [root@zhaocheng ~] # awk' {printf "first column% s\ n", $1} 'test1 first column aaa first column cccc first column fffff first column kkkkk [root@zhaocheng ~] # awk' {printf "first column% s second column% s\ n", $1 Test1 first column aaa second column bbb first column cccc first column fffff first column fffff first column kkkkk second column pppppp [root@zhaocheng ~] # awk'{printf "second column% s\ n", $2} 'test1 second column bbb second column dddd second column ggggg second column pppppp [root@zhaocheng ~] # awk' {printf "third column% s\ n" $3} 'test1 third column ooo third column eeee third column hhhhh third column ssssss [root@zhaocheng ~] # awk' {printf "first column% s third column% s\ n", $1 test1 first column ooo first column cccc third column fffff first column fffff first column kkkkk third column ssssss [root@zhaocheng ~] # awk'{printf "first column% s second column% s third column% s third column Test1 first column aaa second column bbb third column ooo first column cccc second column dddd third column eeee first column fffff second column ggggg third column hhhhh first column kkkkk second column pppppp third column ssssss

Formatting for # with a delimiter

[root@zhaocheng ~] # cat test2aaa#bbb#ooocccc#dddd#eeeefffff#ggggg#hhhhhkkkkk#pppppp#ssssss

Commonly use the-v FS input delimiter, separated by #

[root@zhaocheng ~] # awk-v FS= "#" {print $1 recorder 2} 'test2aaa bbbcccc ddddfffff gggggkkkkk pppppp

Common use of the-v OFS output delimiter

[root@zhaocheng ~] # awk-v FS= "#"-v OFS= "#" {print $1 miner 2} 'test2aaa####bbbcccc####ddddfffff####gggggkkkkk####pppppp

Format the output using printf

[root@zhaocheng ~] # awk-v FS= "#" {printf "first column% s\ n", $1} 'test2 first column aaa first column cccc first column fffff first column kkkkk [root@zhaocheng ~] # awk-v FS= "#" {printf "first column% s second column% s\ n", $1m 2}' test2 first column aaa second column bbb first column second column dddd first column fffff first column kkkkk second column pppppp

Combined with the actions that need to be performed before BEGIN processes text, used in conjunction with printf

[root@zhaocheng ~] # awk 'BEGIN {printf "% 1s\ t% s\ n", "user name", "user ID"}' test1 user name user ID [root@zhaocheng ~] # awk 'BEGIN {printf "% 1s\ t s\ n", "user name", "user ID"} {printf "% 1s\ t s\ n", $1 $2} 'test1 user name user IDaaa bbbcccc ddddfffff gggggkkkkk pppppp

Format text with # as the delimiter

[root@zhaocheng ~] # awk-v FS= "#" BEGIN {printf "% 1s\ t% s\ n", "user name", "user ID"} {printf "% 1s\ t s\ n", $1onome2} 'test2 user name user IDaaa bbbcccc ddddfffff gggggkkkkk pppppp

5. AWK mode (pattern)

Use syntax awk [options] 'Pattern {Action}' file1 file2... awk-v FS= for awk: "'BEGIN {print/printf% 1s\ t% s\ n", "xxx", "xxxx"} {xxx}' file for options (option), used-F option, also used-v option for Acation (action), used print and printf for Pattern (mode) After using BEGIN mode and END mode, you are familiar with a condition of awk, that is, awk will finish processing the current line first. In dealing with the next line, if we do not specify any "conditions", awk will process each line in the text line by line. If we specify conditions, only lines that meet the conditions will be processed, and lines that do not meet the conditions will not be processed. This is the pattern in awk.

In fact, when awk performs line-by-line processing, it will take pattern (pattern) as a condition to determine whether the row to be processed satisfies the condition and whether it can match the pattern. If it matches, it will be processed. If it does not match, it will not be processed.

Before NF==4, when we used NF, we counted the number of columns per row, that is, NF==4 had four columns of rows, and then printed it out.

[root@zhaocheng ~] # awk 'NF==4 {print}' test1fffff ggggg hhhhh bbbbb [root@zhaocheng ~] # awk 'NF==5 {print}' test1kkkkk pppppp ssssss xxxxxx mmmmmmmm

This is the relational expression used by awk, such as = =, that is, when the result is true, the condition is satisfied, and if you want to match the specified pattern, the corresponding action will be performed when it is satisfied.

Example

[root@zhaocheng ~] # cat test1aaa bbb ooocccc dddd eeeefffff ggggg hhhhh bbbbbkkkkk pppppp ssssss xxxxxx mmmmmmmm rows greater than 4 columns [root@zhaocheng ~] # awk'NF > 4 {print $0} 'test1kkkkk pppppp ssssss xxxxxx mmmmmmmm greater than or equal to 4 column rows [root@zhaocheng ~] # awk' NF > = 4 {print $0} 'test1fffff ggggg hhhhh bbbbbkkkkk pppppp ssssss xxxxxx mmmmmmmm less than or equal to 4 column rows [root@zhaocheng ~] # awk' NF

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.