Big data, a good programmer, shares AWK details on the learning route. 12/27 Update SLTechnology News&Howtos

Big data, a good programmer, shares AWK details on the learning route.

2025-12-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Good programmer big data learns how to share the detailed explanation of AWK. Awk is a powerful text analysis tool, which is particularly powerful compared to grep search, sed editor and awk in data analysis and report generation. To put it simply, awk is to read the file line by line, slice each line with a space as the default separator, and then analyze the cut part.

Awk browses and extracts information in a file or string based on specified rules, and awk extracts information before other text operations can be performed. A complete awk script is usually used to format information in a text file.

Typically, awk processes units as a behavior of a file. Awk receives each line of the file and then executes the appropriate command to process the text.

Awk operation

There are three ways to call awk

1. Command line mode

Awk [- F field-separator] 'commands' input-file (s)

Where commands is the real awk command, and [- F domain delimiter] is optional. Input-file (s) is the file to be processed. In awk, each item separated by a domain delimiter in each line of a file is called a field. In general, when the-F domain delimiter is not named, the default field delimiter is a space.

2.shell scripting mode

Insert all the awk commands into a file and make the awk program executable, and then the awk command interpreter, as the first line of the script, is called by typing the name of the script.

Equivalent to the first line of the shell script: #! / bin/sh

Can be replaced by: #! / bin/awk

3. Insert all the awk commands into a separate file and call: awk- f awk-script-file input-file (s)

Where the-f option loads the awk script in awk-script-file, and input-file (s) is the same as above.

Awk syntax

1. Awk command format

(1) awk [- F separated domain] 'command' input-file (s)

(2) awk- f awk-script-file input-file (s)

Simulation file:

Cat employee.txt

100 Thomas Manager Sales 5000

200 Jason Developer Technology 5500

300 Sanjay Sysadmin Technology 7000

400 Nisha Manager Marketing 9500

500 Randy DBA Technology 60002

2.awk operation

1. Each line of the output file:

Awk'{print $0}'. / employee.txt

two。 Output / etc/passwd first field

Awk-F ":"'{print $1}'/ etc/passwd

3. Print the entire contents of the file

Awk'{print $0} 'employee.txt

4. Extract the first column from the file test

Awk'{print $1} 'employee.txt

Awk-F'{print $1} 'employee.txt

5. List all user names and shell names logged in

Awk-F':'{print $1 camera 6}'/ etc/passwd

When the delimiter is multiple symbols, such as:

A, b, c, d

A1, b1, c1, d1

Awk-F','{print $1J 'filename 2}'

6. Print the line with the user name root

Awk-F':'$1percent = "root" {print $0}'/ etc/passwd

Awk-F':'$1 etc/passwd = "keke" {print $1}'/ etc/passwd

Note: $1 percent = "root" and $1 percent = "keke" are both conditions of judgment.

The awk workflow is like this: read in a record separated by the'\ n' newline character, and then divide the record into fields by the specified domain delimiter, populating the domain, $0 represents all fields, $1 represents the first domain, and $n represents the nth domain. The default field delimiter is the blank key or the [tab] key.

7. Add a header to the output information

Awk-F ":" 'BEGIN {print "name\ tshell\ nMurray -"}

{print $1 "\ t" $6}'/ etc/passwd

8. Add the header and end to the output information

Awk-F: 'BEGIN {print "name\ tshell\ nMurray -"} {print $1 "\ t" $6}

END {print "end-of-report"}'/ etc/passwd

Awk-F ":" 'BEGIN {print "--BEGIN--"}

$1percent = "root" {print $1}

END {print "- END-"}'/ etc/passwd

Awk-F ":" 'BEGIN {print "--BEGIN--"} {if ($1mm = "root") print $1}

END {print "- END-"}'/ etc/passwd

The awk workflow is like this: first execute BEGING, then read the file, read in a record separated by the / n newline character, and then divide the record into fields according to the specified domain delimiter, populate the domain, $0 represents all fields, $1 represents the first domain, $n represents the nth domain, and then starts to execute the action action corresponding to the mode. Then start reading the second record until all the records have been read, and finally perform the END operation.

The difference between awk and mapreduce

1. Awk is mainly used for the operation of stand-alone files

2. Mapreduce can be used in distributed file systems and can be used to operate on a large amount of data. The disadvantage is that programming is more complex than awk. However, with the support of the framework, writing mapreducce programs only needs to be responsible for business logic.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.