In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Good programmer big data learns how to share the detailed explanation of AWK. Awk is a powerful text analysis tool, which is particularly powerful compared to grep search, sed editor and awk in data analysis and report generation. To put it simply, awk is to read the file line by line, slice each line with a space as the default separator, and then analyze the cut part.
Awk browses and extracts information in a file or string based on specified rules, and awk extracts information before other text operations can be performed. A complete awk script is usually used to format information in a text file.
Typically, awk processes units as a behavior of a file. Awk receives each line of the file and then executes the appropriate command to process the text.
Awk operation
There are three ways to call awk
1. Command line mode
Awk [- F field-separator] 'commands' input-file (s)
Where commands is the real awk command, and [- F domain delimiter] is optional. Input-file (s) is the file to be processed. In awk, each item separated by a domain delimiter in each line of a file is called a field. In general, when the-F domain delimiter is not named, the default field delimiter is a space.
2.shell scripting mode
Insert all the awk commands into a file and make the awk program executable, and then the awk command interpreter, as the first line of the script, is called by typing the name of the script.
Equivalent to the first line of the shell script: #! / bin/sh
Can be replaced by: #! / bin/awk
3. Insert all the awk commands into a separate file and call: awk- f awk-script-file input-file (s)
Where the-f option loads the awk script in awk-script-file, and input-file (s) is the same as above.
Awk syntax
1. Awk command format
(1) awk [- F separated domain] 'command' input-file (s)
(2) awk- f awk-script-file input-file (s)
Simulation file:
Cat employee.txt
100 Thomas Manager Sales 5000
200 Jason Developer Technology 5500
300 Sanjay Sysadmin Technology 7000
400 Nisha Manager Marketing 9500
500 Randy DBA Technology 60002
2.awk operation
1. Each line of the output file:
Awk'{print $0}'. / employee.txt
two。 Output / etc/passwd first field
Awk-F ":"'{print $1}'/ etc/passwd
3. Print the entire contents of the file
Awk'{print $0} 'employee.txt
4. Extract the first column from the file test
Awk'{print $1} 'employee.txt
Or
Awk-F'{print $1} 'employee.txt
5. List all user names and shell names logged in
Awk-F':'{print $1 camera 6}'/ etc/passwd
When the delimiter is multiple symbols, such as:
A, b, c, d
A1, b1, c1, d1
Awk-F','{print $1J 'filename 2}'
6. Print the line with the user name root
Awk-F':'$1percent = "root" {print $0}'/ etc/passwd
Or
Awk-F':'$1 etc/passwd = "keke" {print $1}'/ etc/passwd
Note: $1 percent = "root" and $1 percent = "keke" are both conditions of judgment.
The awk workflow is like this: read in a record separated by the'\ n' newline character, and then divide the record into fields by the specified domain delimiter, populating the domain, $0 represents all fields, $1 represents the first domain, and $n represents the nth domain. The default field delimiter is the blank key or the [tab] key.
7. Add a header to the output information
Awk-F ":" 'BEGIN {print "name\ tshell\ nMurray -"}
{print $1 "\ t" $6}'/ etc/passwd
8. Add the header and end to the output information
Awk-F: 'BEGIN {print "name\ tshell\ nMurray -"} {print $1 "\ t" $6}
END {print "end-of-report"}'/ etc/passwd
Awk-F ":" 'BEGIN {print "--BEGIN--"}
$1percent = "root" {print $1}
END {print "- END-"}'/ etc/passwd
Awk-F ":" 'BEGIN {print "--BEGIN--"} {if ($1mm = "root") print $1}
END {print "- END-"}'/ etc/passwd
The awk workflow is like this: first execute BEGING, then read the file, read in a record separated by the / n newline character, and then divide the record into fields according to the specified domain delimiter, populate the domain, $0 represents all fields, $1 represents the first domain, $n represents the nth domain, and then starts to execute the action action corresponding to the mode. Then start reading the second record until all the records have been read, and finally perform the END operation.
The difference between awk and mapreduce
1. Awk is mainly used for the operation of stand-alone files
2. Mapreduce can be used in distributed file systems and can be used to operate on a large amount of data. The disadvantage is that programming is more complex than awk. However, with the support of the framework, writing mapreducce programs only needs to be responsible for business logic.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.