Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to write an awk script to count the frequency of letters in a group of words

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to write an awk script to count the frequency of letters in a group of words". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to write an awk script to count the frequency of letters in a group of words.

The Linux system provides a word list in the / usr/share/dict/words file, so I already have a ready-made word list. However, although this words file contains a lot of words I want, it also contains some words I don't want. The word I want cannot first be a compound word (that is, a word that does not contain connectors and spaces), nor can it be a proper noun (that is, a word that does not contain uppercase letters). To get this result, I can run the grep command to fetch lines that consist only of lowercase letters:

$grep'^ [amerz] * $'/ usr/share/dict/words

The purpose of this regular expression is to let grep match lines that contain only lowercase letters. The characters ^ and $in the expression represent the beginning and end of the line, respectively. [amurz] grouping matches only lowercase letters from "a" to "z".

Here is an example of output:

$grep'^ [amerz] * $'/ usr/share/dict/words | head

A

Aa

Aaa

Aah

Aahed

Aahing

Aahs

Aal

Aalii

Aaliis

Yes, these are legal words. For example, "aahed" is the past tense of "aah", indicating a sigh when relaxing, while "aalii" is a dense tropical shrub.

Now I just need to write a gawk script to count the number of times each letter appears in the word, and then print out the relative frequency of each letter.

Letter count

One way to count letters using gawk is to iterate through each character in each line of input, and then count each letter between "a" and "z". The substr function returns a substring of a given length, which can contain only one character or a longer string. For example, the following sample code can fetch every character c in the input:

{len = length ($0); for (I = 1; I

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report