Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The method of realizing the Statistics of character occurrence Frequency in the Command Line of Linux

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "the method of realizing the statistics of character occurrence frequency in the command line of Linux". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "the method of realizing the frequency statistics of character occurrence in the command line of Linux".

The command that comes to mind right away, the Linux command that calculates how often words and characters appear in a text file is the wc command.

Before using a script to parse a text file, we must have a text file. To maintain consistency, we will create a text file, and the output of the man command is described below.

The code is as follows:

$man man > man.txt

The above command imports the use of the man command into the man.txt file.

We want to get the most common words and execute the following script for the file we created earlier.

The code is as follows:

$cat man.txt | tr'\ 012' | tr'[: upper:]'[: lower:]'| tr-d'[: punct:]'| grep-v'[^ a Musz]'| sort | uniq-c | sort-rn | head

Sample Output

The code is as follows:

7557

262 the

163 to

112 is

112 a

78 of

78 manual

76 and

64 if

63 be

The script above outputs the ten most commonly used words.

How do you look at individual letters? Then use the following command.

The code is as follows:

$echo 'tecmint team' | fold-W1

Sample Output

[code] t

E

C

M

I

N

T

T

E

A

M

Note:-W1 only sets the length

Now we will sort each letter from that text file and get the ten most common characters of the desired output frequency.

$fold-W1 < man.txt | sort | uniq-c | sort-rn | head

Sample Output

The code is as follows:

8579

2413 e

1987 a

1875 t

1644 i

1553 n

1522 o

1514 s

1224 r

1021 l

How to distinguish case? We all ignored case before. So, use the following command.

$fold-W1 < man.txt | sort | tr'[: lower:]''[: upper:]'| uniq-c | sort-rn | head-20

Sample Output

The code is as follows:

11636

2504 E

2079 A

2005 T

1729 I

1645 N

1632 S

1580 o

1269 R

1055 L

836 H

791 P

766 D

753 C

725 M

690 U

605 F

504 G

352 Y

344.

Please check the output above. Punctuation marks are included. Let's take him out and use the tr command. GO:

The code is as follows:

$fold-W1 < man.txt | tr'[: lower:]'[: upper:]'| sort | tr-d'[: punct:]'| uniq-c | sort-rn | head-20

Sample Output

The code is as follows:

11636

2504 E

2079 A

2005 T

1729 I

1645 N

1632 S

1580 O

1550

1269 R

1055 L

836 H

791 P

766 D

753 C

725 M

690 U

605 F

504 G

352 Y

Now that we have three texts, let's view the results with the following command.

The code is as follows:

$cat * .txt | fold-W1 | tr'[: lower:]'[: upper:]'| sort | tr-d'[: punct:]'| uniq-c | sort-rn | head-8

Sample Output

The code is as follows:

11636

2504 E

2079 A

2005 T

1729 I

1645 N

1632 S

1580 O

Next we will generate rare words at least ten letters long. Here is a simple script:

The code is as follows:

$cat man.txt | tr''\ 012' | tr'[: upper:]'[: lower:]'| tr-d'[: punct:]'| tr-d'[0-9]'| sort | uniq-c | sort-n | grep-E'.'| head

Sample Output

The code is as follows:

1 ──

1 an all

1 abc any or all arguments within are optional

1 able see setlocale for precise details

1 ab options delimited by cannot be used together

1 achieved by using the less environment variable

1 a child process returned a nonzero exit status

1 act as if this option was supplied using the name as a filename

1 activate local mode format and display local manual files

1 acute accent

Note: above. More and more, in fact, we can use. {10} to get the same effect.

These simple scripts let us know the most frequent words and characters in English.

At this point, I believe you have a deeper understanding of "the method of character occurrence frequency statistics in the command line of Linux". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report