Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Using awk+sort+uniq for text Analysis

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Original works are allowed to be reproduced. When reprinting, please be sure to indicate the original origin of the article, the author's information and this statement in the form of hyperlinks. Otherwise, legal liability will be investigated. Http://wanyuetian.blog.51cto.com/3984643/1716971

1. Uniq command

Uniq-report or omit repeated lines

Description: uniq checks the uniqueness of a specified ASCII file or standard input to determine repeated lines in a text file. Commonly used for system troubleshooting and log analysis

Command format:

Uniq [OPTION]... [File1 [File2]]

Uniq removes duplicate lines from the sorted text file File1 and outputs them to standard output or File2. Often used as a filter, used with pipes.

Before using the uniq command, you must ensure that the text files of the operation have been sorted by sort, and if you run uniq with no parameters, duplicate lines will be deleted.

Common parameters:

-c,-- count prefix lines by the number of occurrences counts after removing duplicates

2. Actual combat exercise

Test data:

1 2 3 4 5 6 7 8 [root@web01 ~] # cat uniq.txt 10.0.0.9 10.0.0.8 10.0.0.7 10.0.0.7 10.0.0.8 10.0.0.8 10.0.0.9

A. directly connect to the file, without any parameters, and only remove the weight of the same content adjacent to it:

1 2 3 4 5 6 [root@web01 ~] # uniq uniq.txt 10.0.0.9 10.0.0.8 10.0.0.7 10.0.0.8 10.0.0.9

B, the sort command makes the duplicate lines adjacent (the-u parameter can also be completely deduplicated), and then completely deduplicates with uniq

12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [root@web01] # sort uniq.txt 10.0.0.7 10.0.0.7 10.0.0.8 10.0.0.8 10.0.0.8 10.0.98 10.0.0.9 [root@web01] # sort-u uniq.txt 10.0.0.7 10.0.0.8 10.0.0.9 [root@web01] # sort uniq. Txt | uniq 10.0.0.7 10.0.0.8 10.0.0.9

C, sort cooperate with uniq to count after weight removal

1 2 3 4 [root@web01 ~] # sort uniq.txt | uniq-c 2 10.0.0.7 3 10.0.0.8 2 10.0.0.9

3. Enterprise case

Deal with the contents of the file, take out the domain name and sort it according to the domain name (Baidu and sohu interview questions)

1 2 3 4 5 6 7 [root@web01 ~] # cat access.log http://www.etiantian.org/index.html http://www.etiantian.org/1.html http://post.etiantian.org/index.html http://mp3.etiantian.org/index.html http://www.etiantian.org/3.html http://post.etiantian.org/2.html

Answer:

Analysis: this kind of problem is the most common problem in operation and maintenance work. It can evolve into an analysis log, view the number of connections in each state of TCP, view the ranking of connections per IP, and so on.

1 2 3 4 [root@web01 ~] # awk-F'[/] +'{print $2} 'access.log | sort | uniq-c | sort-rn-K1 3 www.etiantian.org 2 post.etiantian.org 1 mp3.etiantian.org

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report