In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article will explain in detail how to use the awk command in linux. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.
Brief introduction
Awk is a powerful text analysis tool, compared with the search of grep and the editor of sed, awk is particularly powerful in analyzing data and generating reports. To put it simply, awk is to read the file line by line, slice each line with a space as the default separator, and then analyze the cut part.
There are three different versions of awk: awk, nawk and gawk, which are not specifically specified. Generally speaking, gawk,gawk is the GNU version of AWK.
Awk gets its name from the initials of its founders Alfred Aho, Peter Weinberger and Brian Kernighan. In fact, AWK does have its own language, the AWK programming language, which the three creators have officially defined as a "style scanning and processing language". It allows you to create short programs that read input files, sort data, process data, perform calculations on input, generate reports, and countless other functions.
Usage
Awk'{pattern + action}'{filenames}
Although the operation can be complex, the syntax is always like this, where pattern represents what AWK looks for in the data, and action is a series of commands that are executed when a match is found. Curly braces ({}) do not need to appear all the time in the program, but they are used to group a series of instructions according to a specific pattern. Pattern is the regular expression that you want to represent, surrounded by diagonal bars.
The most basic function of awk language is to browse and extract information from files or strings based on specified rules. After awk extracts information, other text operations can be carried out. A complete awk script is usually used to format information in a text file.
Typically, awk processes units as a behavior of a file. Awk receives each line of the file and then executes the appropriate command to process the text.
Call awk
There are three ways to call awk
Description:
Awk is designed for data flow and can manipulate columns and rows. Sed is more about matching, replacing and deleting.
Awk has many built-in functions, such as arrays, functions and so on. Flexibility is the biggest advantage of awk.
The structure of awk
Awk'
BEGIN {print "start"}
Pattern {commands}
END {print "end"}'
File
In order to watch, I hit enter, which is actually one line.
An awk script usually consists of three parts
1. BEGIN statement block
two。 General statement blocks that can use pattern matching
3. END statement block
Any part of them can be left out of the script. Scripts are usually enclosed in double or single quotation marks.
For example:
Awk 'BEGIN {iTun0} {iTun0} END {print I}' filename
working principle
The awk command works as follows:
1. Execute statements in the BEGIN {commands} statement block
two。 Read a line from a file or stdin, and then execute pattern {commands}. Iterate until all reads are complete
3. Finally execute the END {commands} statement block
Again, they can do without any of them.
And the function of awk is much more than that.
Examples of getting started:
The code is as follows:
Echo | awk'{var1= "v1"; var2= "v2"; var3= "v3"; print var1,var2,var3;}'
Print: v1 v2 v3
Explanation: commas are delimiters (delimiters)
Echo | awk'{var1= "v1"; var2= "v2"; var3= "v3"; print var1 "-" var2 "-" var3;}'
Print v1-v2-v3
Explanation: double quotation marks are connectors
Any other symbol can not output v1Magol v2Magnev3 normally.
Interpretation-help (a very large and complex help document, officially used 410 pages of PDF to introduce, if I say a word, you do not believe me. )
Usage: awk [POSIX or GNU style options]-f script file [-] file.
Usage: awk [POSIX or GNU style options] [-] 'program' file.
POSIX option: GNU length option:
-f script file-- file= script file
-F fs-- field-separator=fs
Specifies the input text delimiter, and fs is a string or a regular expression
-v var=val-- assign=var=val
Pay the external variable value to var
-m [fr] val
-O-- optimize
Enable the internal representation of some optimizers.
W compat-compat
Run awk in compatibility mode. So gawk behaves exactly like standard awk, and all awk extensions are ignored.
W copyleft-copyleft
Print short copyright information
W copyright-copyright
Print a short version of the General Public license and then exit
-W dump-variables [= file]-- dump-variables [= file]
Prints a sorted list of global variables, their types, and submitted final values.
W exec=file-exec=file
Similar to-f, but different from him in two ways, (I'll upload the relevant document later, it's too long)
W gen-po-gen-po
(too much content)
-W help-- help printing help
-W lint [= fatal]-- lint [= fatal]
Warning is suspicious or does not migrate to other awk implementation structures
W lint-old-lint-old
Print warnings about structures that cannot be ported to traditional unix platforms
W non-decimal-data-non-decimal-data
Enable automatic input of data interpretation, octal and hexadecimal values
-W profile [= file]-- profile [= file]
Enable awk program profiling
W posix-posix
In the strict sense of the POSIX mode operation.
W re-interval-re-interval
Allow interval expressions on regular expressions
W source=program-text-source=program-text
W traditional-traditional
Regular expression matching of traditional Unix awk
W usage-usage
W use-lc-numeric-use-lc-numeric
When parsing numeric input, the decimal point character in the forced locale is used
data
W version-version
To submit an error report, please refer to the "Bugs" page in "gawk.info", which is located in "Reporting" in the printed version.
Problems and Bugs "section
Note: gawk is the GNU version of awk. Even if help, you need to install gawk first under ubuntu.
Let's not interpret it this time. In order to increase everyone's information and fun, let's start with some basics:
Some special variables:
NR: indicates the number of records, corresponding to the line number during execution
NF: indicates the number of fields that correspond to the current row during execution
$0: this variable contains the text content of the current line during execution
$1: the text content of the first field
$2: the text content of the second field
Example:
Example 1.
The code is as follows:
Echo-e "line1 f2 f3\ nline2 f4 f5\ nline3 f6 f7" |\ # this\ is used to write multiline commands in the window.
Awk'{
Print "Line no:" NR ", No of fields:" NF, "$0 =" $0, "$1 =" $1, "$2 =" $2, "$3 =" $3 "
}'
Note: $1 prints the first, $NF prints the last field, and $(NF-1) prints the penultimate field.
Example 2.
Seq 5 | awk 'BEGIN {sum=0;print "Summation:"} {print $1 "+"; sum+=1} END {print "="; print sum}'
The basic format is used in this example.
Sum is initialized in BEGIN and Summation is printed
The middle module prints the first column and then gives it to sum+1
Sum is printed in END
Example 3. About-v external variables
The code is as follows:
$VAR=10000
$echo | awk-v VARIABLE=$VAR' {print VARABLE}'
There is another flexible way to pass multiple external variables to awk, such as:
The code is as follows:
$var1= "value1" var2= "value2"
$echo | awk'{print v1je v2} 'v1=$var1 v2=$var2
If it comes from a file,
Awk'{print v1 and v2} 'v1=$var1 v2=$var2 filename
Example 4
$awk'NR < 5'# Line number less than 5
$awk 'NR==1,NR==4' # lines with line numbers between 1 and 5
$awk'/ linux/' # contains lines of style linux (styles can be specified with regular expressions)
$awk'! / linux/' # does not contain lines of style linux
Write these first this time, and strive to spend 2 pages to make a more comprehensive understanding of awk.
Awk supplement
Before we learned the basic introduction to awk, I was pleasantly surprised to find that there is a detailed article on awk, there are ideas to write, can not all reprint, transformed into their own way to write some.
Lecture on built-in variables and some string functions
Built-in variables (there are translation special variables and environment variables, which are officially translated as built-in variables)
Variable
Description
The nth field of the current record, separated by FS. $0 complete input record. The number of ARGC command line arguments. The location of the current file on the ARGIND command line (starting at 0). ARGV contains an array of command line arguments. BINMODE on non-POSIX systems, all I / Os specified by this variable are associated with arrays using the binary mode CONVFMT numeric conversion format (default is% .6g) ENVIRON environment variable. Description of the last system error in ERRNO. List of FIELDWIDTHS field widths (separated by spacebar). FILENAME current file name. FNR is the same as NR, but relative to the current file FPAT
This is a regular expression (string) that tells gawk to create a field based on the text that matches the regular expression
FS field delimiter (default is any space). If IGNORECASE is true, a match that ignores case is performed. LINT
When this variable is true (non-zero or non-empty), gawk behaves like the "--lint" command line option
The number of fields in the current record of NF. The number of current records in NR. The output format of OFMT numbers (default is% .6g). OFS outputs the field delimiter (the default is a space). ORS outputs the record delimiter (the default is a newline character). PROCINFO
The elements of this array provide access to information about running awk programs
RLENGTH the length of the string matched by the match function. RS record delimiter (default is a newline character). RT one record at a time sets the first position of the string RSTART matched by the match function.
SUBSEP
Array subscript delimiter (default is\ 034).
TEXTDOMAIN this variable is used for the internationalization of programs
Blue is a newly added built-in variable.
A simple example:
1.
01.sed 1q / etc/passwd | awk'{FS = ":"; print $1}'
Print the first line of the password with a colon separator
two。
The code is as follows:
Awk 'END {print FILENAME}' awk.txt
Print text FILENAME
3. Seq 100 | awk 'NR==4,NR==6'
Print 4 to 6 lines
Then introduce a few awk built-in string functions, also talk about a part.
Length (string):
Returns the length of the string
Index (string,serch_string):
Returns the position where search_string appears in the string
Split (string,array,delimiter):
Generate a list of strings with delimiters and store the list in an array
Substr (string,array,delimiter):
Generate a substring in a string with the cheap amount of characters starting and ending, and return the substring
Sub (regex,replacement_str,string):
Replace the first part of the regular expression match with replacement_str
Gsub (regex,replacement_str,string):
Similar to sub (). However, this function replaces everything that the regular expression matches to
Match (regex,string):
Check whether the regular expression can match the string. If it can match, a non-zero value is returned; otherwise, 0.match () is returned with two related special variables, RSTART drinking RLENGTH. The variable RSTART contains the actual location of what the regular expression matches, while the variable RLENGTH contains the length of the content matched by the regular expression.
For example:
1. $awk'{sub (/ test/, "mytest"); print} 'testfile
Match throughout the record, replacement occurs only when the first match occurs
2. $awk'{sub (/ test/, "mytest"); $1}; print} 'testfile
Match in the first field of the entire record, and replacement occurs only when the first match occurs
3. $awk'{print index ("test", "mytest")} 'testfile
The instance returns the location of test in mytest, and the result should be 3
4. $awk'{print length ("test")}'
Instance returns the length of the test string.
Supplement II to awk
This section may be a bit rough, and there is too little time.
one。 Built-in function
Note a convention commonly known as grammatical habit: [a] stands for an optional.
Digital function (Numeric Functions)
Function name
Description
Atan2 (x) returns the inverse tangent of the x arc cos (x) returns the cosine of x exp (x) returns the index of x int (x) returns the nearest integer, the wind vane points to 0log (x) returns the natural logarithm of x rand () returns the random number sin (x) returns the sine sqrt (x) of x returns the square root of x srand ([x]) to generate random numbers, you can set the starting point
String manipulation function (String-Manipulation Functions)
Note: the blue part is unique to gawk, and awk does not have this function.
Function name
Description
Asort (source [, dest [, how]]) returns the number of array elements (more content) asorti (source [, dest [, how]]) and asort, (slightly different) gensub (regexp, replacement, how [, target]) searches for regexpgsub (regexp, replacement [, target]) that matches the regular expression RegExp, and replaces the first content of the regular expression match with replacement_strindex (in Find) returns the position of find in the string in the number of characters in length ([string]) string match (string, regexp [, array]) checks whether regular expressions match the string patsplit (string, array [, fieldpat [, seps]])
Partitioned into strings defined by fieldpat and stored in array, delimited strings exist in the seps array
Split (string, array [, fieldsep [, seps]]) generates a list of strings with delimiters and stores the list in the array sprintf (format, expression1,...). Print strtonum (str) character conversion to numeric sub (regexp, replacement [, target]) replace the first content to which the regular expression matches with replacementsubstr (string, start [, length]) split the string, convert tolower (string) to lowercase toupper (string) to uppercase according to the actual position and length
Input / output function (Input/Output Functions)
Function
Description
Close (filename [, how]) closes the file input / output stream fflush ([filename]) refreshes any buffered output associated with the file name system (command) executes the operating system command and returns the value to the awk program
Time function (Time Functions)
Function
Description
Mktime (datespec) datespec is a timestamp format, which is the same as systime () format. Strftime ([format [, timestamp [, utc-flag]) formats the contents of timestamp, and returns the date format systime () returns the system time, accurate to seconds.
Bit operation function (Bit-Manipulation Functions)
Function
Description
The inverse code lshift (val, count) of and (v1PowerV2) v1MagneEv2 and the result of operation compl (val) val returns the value of val moving count bit to the left or (v1Powerv2) v1Power2 or operation rshift (val, count) returns the value of val moving count bit to the right xor (v1MagneV2) returns the XOR value of v1MagneV2
Get type information (Getting Type Information)
Function
Description
Isarray (x) returns true if x is an array. Otherwise, false
String conversion function (String-Translation Functions)
Function
Description
Bindtextdomain (directory [, domain]) sets the directory and domain dcgettext (string [, domain [, category]]) returned by awk to search for information. String translates the locale category categorydcngettext of the local domain domain (string1, string2, number [, domain [, category]]).
Returns the plural of the number of translations for string1 and string2, with string1,string2 in the text field of the locale category
The built-in function also has some advanced features, such as many examples, which will be supplemented later.
two。 Custom function
Enter the format below:
Function name ([parameter-list])
{
Body-of-function
}
Such as:
Function myprint (num)
{
Printf "% 6.3g\ n", num
}
The awk command has a lot of functions, and that's all I'm going to write. It may be mentioned in more cases in the future when combined with other commands.
This is the end of this article on "how to use awk commands in linux". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.