Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of AWK Command in big data

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article will explain in detail the example analysis of the AWK command in big data. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

For the following nginx log access.log, use a script to analyze the Top 10 that accesses ip.

In fact, this question is not difficult, but after examining several commonly used shell commands, awk, uniq, sort, head, I think it should be necessary for big data development, operation and maintenance, data warehouse and so on.

2018-11-20T23:37:40+08:00 119.15.90.30-"GET / free.php?proxy=out_hp&sort=&page=1 HTTP/1.1" / free.php "- 0.156 3626849gam7213 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 -" Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER) "

2018-11-20T23:37:44+08:00 117.30.95.62-"GET / partner.php HTTP/1.1"/ partner.php"-0.016 457 6534 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256-"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"

2018-11-20T23:37:44+08:00 117.30.95.62-"GET / css/bootstrap.min.css HTTP/1.1"/ css/bootstrap.min.css"-0.045 398 19402 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256-"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"

2018-11-20T23:37:44+08:00 117.30.95.62-"GET / css/hint.min.css HTTP/1.1" / css/hint.min.css "- 0.000 393 1635 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 -" Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 "

Let's get the answer.

Cat access.log | awk'{print $2}'| uniq-c | sort-K1-nr | head-10

In fact, there are many variants of this problem, such as writing in the language you are most familiar with in addition to scripts, and what to do if the file is too large and exceeds the memory limit. However, these are beside the point. Today we will mainly talk about some simple applications of awk in our work.

In fact, the function of awk is very powerful, but today we will mainly talk about the usage of awk that is commonly used in our work.

Awk'{[pattern] action}'{filenames}

Cutting file

-F specifies the delimiter of the split file. The default is a space or\ t. For example, in the log above, we want to get the ip address of the second column. We can write like this.

Awk-F'{print $2} 'access.log

We don't have to write the blanks. I'll write them here to make a demonstration.

In fact, there is also a special character, such as the default delimiter in hive is 0x01, how to write this with awk?

Awk-F'\ 001' {print $1} 'abcd.txt

The use of the built-in variable $0 is used to print the field of the entire line. after $n is cut by the delimiter specified by the-F parameter, $n is used to print out the fields, with the index starting at 1.

How many columns are there after each row of NF data is split, for example, we can print out the last column with print $NF

There are times when we can use awk to intercept some of the fields and splice together some of the statements we want.

For example, we want to intercept the ip field in the above access.log, then generate some sql and insert it into the database.

Awk'{print "insert into mytable (ip) values ('\'" $2 "');"} 'access.log > / tmp/ip.sql

Some people may ask, when will this scenario be used? for example, if you have 10, 000 or more pieces of data, you can write a sql to insert it, but if it is too many, writing too much data at one time will lead to locking the table, and at this time other people will not be able to insert it, especially if it is an online production environment, so for some similar operations We can split multiple sql to execute one by one, so that the time for a single sql to lock the table will be reduced, avoiding the database unavailable caused by locking the table for a long time.

Regular matching

Sometimes we just want to print out the columns we want, and we can do it through regular matching.

For example, if we want to print out the ip at the beginning of 117 in the above access.log, we can do this.

Awk'$2 ~ / ^ 117 / {print $2} 'access.log

Sql-like function

In fact, awk can also help us to achieve some simple sql-like functions, let's also talk about it briefly.

For example, we have a student table below.

Id Class name

Id class name

Zhang San, Class 11

Li Si, Class 22

Class 3, class 1, Wang Wu

Zhao Liu, Class 4, 3

For example, if we want to count the number of students in each class, we can use the following command

Awk'{a [$2] + +} END {for (i in a) {print I "number:" a [I]}} 'student.txt

This is the end of the article on "sample Analysis of AWK commands in big data". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report