How to use awk to accomplish more structured and complex tasks in linux 07/12 Update SLTechnology News&Howtos

How to use awk to accomplish more structured and complex tasks in linux

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to use awk to accomplish more structured and complex tasks in linux. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

Program structure of awk

Awk scripts are made up of functional blocks surrounded by {} (curly braces), with two special functional blocks, BEGIN and END, that are executed before the first line of input stream and after the last line of processing. Between the two, the format of the block is:

Mode {Action statement}

When the rows in the input buffer match the pattern, each block executes. If the pattern is not included, the function block is executed on every line of the input stream.

In addition, the following syntax can be used to define functions that can be called from any block in awk.

Function function name (argument list) {statement}

This combination of pattern matching blocks and functions allows developers to structure awk programs for reuse and improve readability.

How awk handles text streaming

Awk reads the text one line at a time from the input file or stream and parses it into several fields using field delimiters. In awk terminology, the current buffer is a record. There are some special variables that affect how awk reads and processes files:

FS (

Field delimiter field separator

). By default, this is any space character (space or tab).

RS (

Record delimiter record separator

). By default, it is a new line (n).

NF (

Number of fields number of fields

). When awk parses a row, this variable is set to the number of fields parsed.

$0: current record.

$1, $2, $3, etc.: the first, second, third and other fields of the current record.

NR (

Number of records number of records

). The number of records that have been parsed by the awk script so far.

There are many more variables that affect awk behavior, but knowing these is enough to get started.

Single-line awk script

What's interesting for such a powerful tool is that most of the use of awk is basic one-line scripts. Perhaps the most common awk program is to print selected fields on input lines such as CSV files, log files, and so on. For example, the following one-line script prints a list of user names from / etc/passwd:

Awk-F ":"'{print $1}'/ etc/passwd

As mentioned above, $1 is the first field in the current record. The-F option sets the FS variable to the character:.

Field delimiters can also be set in the BEGIN function block:

Awk 'BEGIN {FS= ":"} {print $1}' / etc/passwd

In the following example, every user whose shell is not / sbin/nologin can print it by adding a matching pattern to the block:

Awk 'BEGIN {FS= ":"}! /\ / sbin\ / nologin/ {print $1}' / etc/passwdawk Advanced: Mail merge

Now that you have some basics, try to learn more about awk: creating a mail merge with a more structured example.

Mail merge uses two files, one of which (called email_template.txt in this example) contains a template for the email you want to send:

From: Program committee To: {firstname} {lastname} Subject: Your presentation proposal Dear {firstname}, Thank you for your presentation proposal: {title} We are pleased to inform you that your proposal has been successful! Wewill contact you shortly with further information about the eventschedule. Thank you,The Program Committee

The other is a CSV file (named proposals.csv) with the person you want to send the email to:

Firstname,lastname,email,titleHarry,Potter,hpotter@hogwarts.edu, "Defeating your nemesis in 3 easy steps" Jack,Reacher,reacher@covert.mil, "Hand-to-hand combat for beginners" Mickey,Mouse,mmouse@disney.com, "Surviving public speaking with a squeaky voice" Santa,Claus,sclaus@northpole.org, "Efficient list-making"

You need to read the CSV file, replace the relevant fields in the first file (skip the first line), and then write the result to a file called acceptanceN.txt, incrementing the N in the file name for each parsing line.

Write the awk program in a file called mail_merge.awk. Statements in awk scripts are separated by;. The first task is to set the field delimiter variables and several other variables required by the script. You also need to read and discard the first line in CSV, otherwise a file that starts with Dear firstname will be created. To do this, use the special function getline and reset the record counter to 0 after reading.

BEGIN {FS= ","; template= "email_template.txt"; output= "acceptance"; getline; NR=0;}

The main function is simple: for each row you process, you set a variable for various fields-- firstname, lastname, email, and title. The template file is read line by line and any sequence of special characters that appear is replaced with the value of the related variable using the function sub. Then output the line and any replacements made to the output file.

Because each line processes template files and different output files, you need to clean up and close the file handles of these files before processing the next record.

{# read the associated field firstname=$1; lastname=$2; email=$3; title=$4; # from the input file and set the output file name outfile= (output NR ".txt"); # read a line from the template, replace a specific field, # and print the result to the output file. While ((getline ln

< template) >

0) {sub (/ {firstname} /, firstname,ln); sub (/ {lastname} /, lastname,ln); sub (/ {email} /, email,ln); sub (/ {title} /, title,ln); print (ln) > outfile } # close the template and output file and proceed to the next record close (outfile); close (template);}

You've done it! Run the script on the command line:

Awk-f mail_merge.awk proposals.csv

Awk-f mail_merge.awk < proposals.csv

You will find the generated text file in the current directory.

Awk Advanced: word Frequency Counting

One of the most powerful features in awk is associative arrays. In most programming languages, array entries are usually indexed by numbers, but in awk, arrays are referenced by a key string. You can store an entry from the file proposals.txt in the previous section. For example, in a single associative array, like this:

Proposer ["firstname"] = $1; proposer ["lastname"] = $2; proposer ["email"] = $3; proposer ["title"] = $4

This makes text processing very easy. A simple program that uses this concept is the word frequency counter. You can parse a file, break out words on each line (ignore punctuation), increment each word on the line, and then output the first 20 words that appear in the text.

First, in a file named wordcount.awk, set the field delimiter to a regular expression that contains spaces and punctuation:

BEGIN {# ignore 1 or more consecutive occurrences of the characters # in the character group below FS= "[.,:; () {} @!\"'\ t] + ";}

Next, the main loop function iterates through each field, ignores any empty fields (as happens if there is punctuation at the end of the line), and increments the number of words in the line:

{for (I = 1; I)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.