Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the skills of Linux command line text operation

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what are the skills of Linux command line Chinese text operation". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "what are the skills of Linux command line Chinese text operation"!

Regular expression

There is no shortage of baffling words in the field of translation, such as "handle", "socket" and "robustness". Of course, "regular expressions" also fall into this category. When I first came into contact with regular expressions, I was very confused about this noun. After an in-depth understanding, I suddenly realized that the so-called regular expression is actually a "regular and patterned string".

There is rarely a technology that can achieve a huge return on value with only a small amount of learning cost. Regular expressions fall into this category. Unfortunately, many people are so hammered by its password-like grammatical form that they are not even allowed to enter the door.

Why should you learn regular expressions? First, it is not difficult to apply this technology in practice, as long as you understand a few metacharacters and uncomplicated syntax, you can obtain powerful text manipulation ability; second, regular expressions often provide the simplest and most efficient solution (sometimes the only solution) for dealing with text. In complex situations, if you don't know regular expressions, there's nothing you can do about it.

Regular expressions are easy to get started, but difficult to master. This article does not intend to challenge this task.

Text retrieval

The grep command can accomplish simple text search tasks.

First prepare a text material and save the grep help page as a text file:

The code is as follows:

> man grep | col-b > grephelp.txt

Next, I want to retrieve all the lines of text in the grephelp.txt file that contain the word "find":

The code is as follows:

> grep "find" grephelp.txt

To find all occurrences of the word `patricia' in a file:

To find all occurrences of the pattern `.Pp'at the beginning of a line:

To find all lines in a file which do not contain the words `foo' or

I want the matching text to be displayed in a different color. You can add the-- color option, and the default color is red.

The code is as follows:

> grep-- color "find" grephelp.txt

I want to display the file name and line number in the matching results, using the-H option to display the file name, and the-n option to display the line number:

The code is as follows:

> grep-H-n-- color "find" grephelp.txt

Grephelp.txt:252: To find all occurrences of the word `patricia' in a file:

Grephelp.txt:256: To find all occurrences of the pattern `.Pp'at the beginning of a line:

Grephelp.txt:265: To find all lines in a file which do not contain the words `foo' or

Many times, we need to know the context before and after the matching line. -An and-B will be your good friends. -A n indicates that the match line and the n rows that follow it are displayed;-B n indicates that the match line and the n rows before it are displayed. Now, we display two additional lines before and after matching lines:

The code is as follows:

> grep-A 2-B 2-H-n-- color "find" grephelp.txt

Grephelp.txt-250-

Grephelp.txt-251-EXAMPLES

Grephelp.txt:252: To find all occurrences of the word `patricia' in a file:

Grephelp.txt-253-

Grephelp.txt-254- $grep 'patricia' myfile

--

--

Grephelp.txt-254- $grep 'patricia' myfile

Grephelp.txt-255-

Grephelp.txt:256: To find all occurrences of the pattern `.Pp'at the beginning of a line:

Grephelp.txt-257-

Grephelp.txt-258- $grep'^\ .Pp 'myfile

--

--

Grephelp.txt-263- match any character.

Grephelp.txt-264-

Grephelp.txt:265: To find all lines in a file which do not contain the words `foo' or

Grephelp.txt-266- `bar':

Grephelp.txt-267-

What should I do if I need to find all the lines of text that do not contain "find"? It's simple, just use the-v option.

There are two variants of grep, egrep and fgrep. Egrep supports extended regular patterns (EREs), making it more powerful to retrieve than grep, which only supports basic regular patterns (BREs); fgrep is the fastest of the three tools because it does not support regular patterns at all.

Text substitution

The tr command can accomplish simple character conversion tasks. For example, you can convert a grephelp.txt file to full-text uppercase through tr:

The code is as follows:

> cat grephelp.txt | tr'[: lower:]'[: upper:]'

In short, tr's job is to convert the characters in the first collection to the corresponding characters in the second collection. Common character sets are as follows:

[: alnum:]: alphanumeric

[: alpha:]: letter

[: cntrl:]: control character

[: digit:]: number

[: graph:]: graphic character

[: lower:]: lowercase

[: print:]: printable characters

[: punct:]: punctuation

[: space:]: White space character

[: upper:]: capital letters

[: xdigit:]: hexadecimal number

The application scenario of the tr command is very limited, and if you want more flexible mode substitution, we also have sed (that is, stream editor, stream editor).

Replace all "find" text in the file with "search":

The code is as follows:

> sed "s/find/search/g" grephelp.txt

In this command, s means to perform a "replace operation", / find/search/ means to replace "find" with "search", and g means to replace all matches in a line. Sed prints the processing results to standard output by default. We can dump the processing results to a new file through redirection, or use the option-I to write the results directly back to the original file (risky, be careful):

The code is as follows:

> sed-I "s/find/search/g" grephelp.txt

Replace all the numbers n in the file with the form "--n muri -":

The code is as follows:

> sed-E "s / ([0-9] +) / -\ 1Murray grephelp.txt g"

The option-E indicates that the extended regular pattern (EREs) is used during processing, and\ 1 in the replacement command indicates the first capture packet that references the regular expression. Note that the-E option is valid only on Mac OS X and FreeBSD systems, while other Unix systems need to use another equivalent option-r.

The functions of sed are much more than these, space is limited, it is impossible to explain the use of sed in detail. If you want to learn more, please move on to this article.

Text deduplication

The code is as follows:

> cat-n sonnet116.txt

1 Let me not to the marriage of true minds

2 Admit impediments. Love is not love

3 Which alters when it alteration finds

4 Or bends with the remover to remove:

5 O, no! It is an ever- fix`ed mark

6 O, no! It is an ever- fix`ed mark

7 That looks on tempests and is never shaken

8 It is the star to every wand'ring bark

9 Whose worth's unknown, although his heighth be taken.

10 Love's not Time's fool, though rosy lips and cheeks

11 Love's not Time's fool, though rosy lips and cheeks

12 Love's not Time's fool, though rosy lips and cheeks

13 Within his bending sickle's compass come

14 Love alters not with his brief hours and weeks

15 But bears it out even to the edge of doom:

16 If this be error and upon me proved

17 I never writ, nor no man ever loved.

This is a sonnet by Shakespeare, except that lines 5 and 10 are repeated (and line 10 is repeated three times). How do I view duplicate lines in the text? The uniq command can help you.

The code is as follows:

> uniq-d sonnet116.txt

O, no! It is an ever- fix`ed mark

Love's not Time's fool, though rosy lips and cheeks

The option-d means that only duplicate lines are output. If you need to remove duplicates, use the uniq command without options:

The code is as follows:

> uniq sonnet116.txt

Let me not to the marriage of true minds

Admit impediments. Love is not love

Which alters when it alteration finds

Or bends with the remover to remove:

O, no! It is an ever- fix`ed mark

That looks on tempests and is never shaken

It is the star to every wand'ring bark

Whose worth's unknown, although his heighth be taken.

Love's not Time's fool, though rosy lips and cheeks

Within his bending sickle's compass come

Love alters not with his brief hours and weeks

But bears it out even to the edge of doom:

If this be error and upon me proved

I never writ, nor no man ever loved.

How many times do you want to see how many times each line is repeated? No problem, use the option-c:

The code is as follows:

Uniq-c sonnet116.txt

1 Let me not to the marriage of true minds

1 Admit impediments. Love is not love

1 Which alters when it alteration finds

1 Or bends with the remover to remove:

2 O, no! It is an ever- fix`ed mark

1 That looks on tempests and is never shaken

1 It is the star to every wand'ring bark

1 Whose worth's unknown, although his heighth be taken.

3 Love's not Time's fool, though rosy lips and cheeks

1 Within his bending sickle's compass come

1 Love alters not with his brief hours and weeks

1 But bears it out even to the edge of doom:

1 If this be error and upon me proved

1 I never writ, nor no man ever loved.

Text sorting

Suppose you have such a report file, the first column is the month, and the second column is the number of sales for that month:

The code is as follows:

> cat report.txt

March,19

June,50

February,17

May,18

August,16

April,31

May,18

July,26

January,24

August,16

The contents of this document are not only out of order, but also repeated. I want to sort alphabetically, with the following command:

The code is as follows:

> sort report.txt

April,31

August,16

August,16

February,17

January,24

July,26

June,50

March,19

May,18

May,18

The option-u (for unique) removes duplicate lines from the sort result:

The code is as follows:

> sort-u report.txt

April,31

August,16

February,17

January,24

July,26

June,50

March,19

May,18

Can you sort it by month? The option-M (for month-sort) can help us:

The code is as follows:

Sort-u-M report.txt

January,24

February,17

March,19

April,31

May,18

June,50

July,26

August,16

Sorting by the numbers in the second column is also simple:

The code is as follows:

> sort-u-tcm report.txt'- K2 report.txt

August,16

February,17

May,18

March,19

January,24

July,26

April,31

June,50

In the above example, the option-tweak minute 'means to split the column of the text with a comma as a separator;-K2 means to sort the second column.

Of course, it's not impossible to put the results in reverse order:

The code is as follows:

> sort-u-r-tweak report.txt'- K2 report.txt

June,50

April,31

July,26

January,24

March,19

May,18

February,17

August,16

Text statistics

The wc command is used to complete text statistics, and it can count the number of bytes (- c), characters (- m), words (- w) and lines (- l) in the file by using different options.

For example, look at the total number of words in the file grephelp.txt:

The code is as follows:

> wc-w grephelp.txt

1571 grephelp.txt

Check out the total number of non-repeating lines in the sonnet116.txt file (nonsense, sonnets are 14 lines, of course):

The code is as follows:

> uniq sonnet116.tx6 | wc-l

fourteen

You should also try Awk and Perl.

If the tools described above still don't satisfy you, you may need a weapon with more firepower. Try Awk and Perl.

The Awk is also an ancient artifact, which may be as old as the sed. Awk is created specifically for text processing, and its syntax and features are very suitable for manipulating text and generating reports. If you need to learn, please refer to this article, you will like it.

Perl has long been known as "writing only languages". In fact, as long as it is handled properly, you can write code that is clear and easy to read and understand with Perl. In my experience, more than 80% of the situations where Perl is used are related to text processing. Perl's built-in regular expression support is probably the best of all languages, coupled with compact syntax and convenient operators that make Perl the dominant player in text processing.

At this point, I believe that everyone on the "Linux command line Chinese text operation skills" have a deeper understanding, might as well to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report