In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "the method of realizing the statistics of character occurrence frequency in the command line of Linux". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "the method of realizing the frequency statistics of character occurrence in the command line of Linux".
The command that comes to mind right away, the Linux command that calculates how often words and characters appear in a text file is the wc command.
Before using a script to parse a text file, we must have a text file. To maintain consistency, we will create a text file, and the output of the man command is described below.
The code is as follows:
$man man > man.txt
The above command imports the use of the man command into the man.txt file.
We want to get the most common words and execute the following script for the file we created earlier.
The code is as follows:
$cat man.txt | tr'\ 012' | tr'[: upper:]'[: lower:]'| tr-d'[: punct:]'| grep-v'[^ a Musz]'| sort | uniq-c | sort-rn | head
Sample Output
The code is as follows:
7557
262 the
163 to
112 is
112 a
78 of
78 manual
76 and
64 if
63 be
The script above outputs the ten most commonly used words.
How do you look at individual letters? Then use the following command.
The code is as follows:
$echo 'tecmint team' | fold-W1
Sample Output
[code] t
E
C
M
I
N
T
T
E
A
M
Note:-W1 only sets the length
Now we will sort each letter from that text file and get the ten most common characters of the desired output frequency.
$fold-W1 < man.txt | sort | uniq-c | sort-rn | head
Sample Output
The code is as follows:
8579
2413 e
1987 a
1875 t
1644 i
1553 n
1522 o
1514 s
1224 r
1021 l
How to distinguish case? We all ignored case before. So, use the following command.
$fold-W1 < man.txt | sort | tr'[: lower:]''[: upper:]'| uniq-c | sort-rn | head-20
Sample Output
The code is as follows:
11636
2504 E
2079 A
2005 T
1729 I
1645 N
1632 S
1580 o
1269 R
1055 L
836 H
791 P
766 D
753 C
725 M
690 U
605 F
504 G
352 Y
344.
Please check the output above. Punctuation marks are included. Let's take him out and use the tr command. GO:
The code is as follows:
$fold-W1 < man.txt | tr'[: lower:]'[: upper:]'| sort | tr-d'[: punct:]'| uniq-c | sort-rn | head-20
Sample Output
The code is as follows:
11636
2504 E
2079 A
2005 T
1729 I
1645 N
1632 S
1580 O
1550
1269 R
1055 L
836 H
791 P
766 D
753 C
725 M
690 U
605 F
504 G
352 Y
Now that we have three texts, let's view the results with the following command.
The code is as follows:
$cat * .txt | fold-W1 | tr'[: lower:]'[: upper:]'| sort | tr-d'[: punct:]'| uniq-c | sort-rn | head-8
Sample Output
The code is as follows:
11636
2504 E
2079 A
2005 T
1729 I
1645 N
1632 S
1580 O
Next we will generate rare words at least ten letters long. Here is a simple script:
The code is as follows:
$cat man.txt | tr''\ 012' | tr'[: upper:]'[: lower:]'| tr-d'[: punct:]'| tr-d'[0-9]'| sort | uniq-c | sort-n | grep-E'.'| head
Sample Output
The code is as follows:
1 ──
1 an all
1 abc any or all arguments within are optional
1 able see setlocale for precise details
1 ab options delimited by cannot be used together
1 achieved by using the less environment variable
1 a child process returned a nonzero exit status
1 act as if this option was supplied using the name as a filename
1 activate local mode format and display local manual files
1 acute accent
Note: above. More and more, in fact, we can use. {10} to get the same effect.
These simple scripts let us know the most frequent words and characters in English.
At this point, I believe you have a deeper understanding of "the method of character occurrence frequency statistics in the command line of Linux". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.