In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "what are the skills of Linux command line Chinese text operation". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "what are the skills of Linux command line Chinese text operation"!
Regular expression
There is no shortage of baffling words in the field of translation, such as "handle", "socket" and "robustness". Of course, "regular expressions" also fall into this category. When I first came into contact with regular expressions, I was very confused about this noun. After an in-depth understanding, I suddenly realized that the so-called regular expression is actually a "regular and patterned string".
There is rarely a technology that can achieve a huge return on value with only a small amount of learning cost. Regular expressions fall into this category. Unfortunately, many people are so hammered by its password-like grammatical form that they are not even allowed to enter the door.
Why should you learn regular expressions? First, it is not difficult to apply this technology in practice, as long as you understand a few metacharacters and uncomplicated syntax, you can obtain powerful text manipulation ability; second, regular expressions often provide the simplest and most efficient solution (sometimes the only solution) for dealing with text. In complex situations, if you don't know regular expressions, there's nothing you can do about it.
Regular expressions are easy to get started, but difficult to master. This article does not intend to challenge this task.
Text retrieval
The grep command can accomplish simple text search tasks.
First prepare a text material and save the grep help page as a text file:
The code is as follows:
> man grep | col-b > grephelp.txt
Next, I want to retrieve all the lines of text in the grephelp.txt file that contain the word "find":
The code is as follows:
> grep "find" grephelp.txt
To find all occurrences of the word `patricia' in a file:
To find all occurrences of the pattern `.Pp'at the beginning of a line:
To find all lines in a file which do not contain the words `foo' or
I want the matching text to be displayed in a different color. You can add the-- color option, and the default color is red.
The code is as follows:
> grep-- color "find" grephelp.txt
I want to display the file name and line number in the matching results, using the-H option to display the file name, and the-n option to display the line number:
The code is as follows:
> grep-H-n-- color "find" grephelp.txt
Grephelp.txt:252: To find all occurrences of the word `patricia' in a file:
Grephelp.txt:256: To find all occurrences of the pattern `.Pp'at the beginning of a line:
Grephelp.txt:265: To find all lines in a file which do not contain the words `foo' or
Many times, we need to know the context before and after the matching line. -An and-B will be your good friends. -A n indicates that the match line and the n rows that follow it are displayed;-B n indicates that the match line and the n rows before it are displayed. Now, we display two additional lines before and after matching lines:
The code is as follows:
> grep-A 2-B 2-H-n-- color "find" grephelp.txt
Grephelp.txt-250-
Grephelp.txt-251-EXAMPLES
Grephelp.txt:252: To find all occurrences of the word `patricia' in a file:
Grephelp.txt-253-
Grephelp.txt-254- $grep 'patricia' myfile
--
--
Grephelp.txt-254- $grep 'patricia' myfile
Grephelp.txt-255-
Grephelp.txt:256: To find all occurrences of the pattern `.Pp'at the beginning of a line:
Grephelp.txt-257-
Grephelp.txt-258- $grep'^\ .Pp 'myfile
--
--
Grephelp.txt-263- match any character.
Grephelp.txt-264-
Grephelp.txt:265: To find all lines in a file which do not contain the words `foo' or
Grephelp.txt-266- `bar':
Grephelp.txt-267-
What should I do if I need to find all the lines of text that do not contain "find"? It's simple, just use the-v option.
There are two variants of grep, egrep and fgrep. Egrep supports extended regular patterns (EREs), making it more powerful to retrieve than grep, which only supports basic regular patterns (BREs); fgrep is the fastest of the three tools because it does not support regular patterns at all.
Text substitution
The tr command can accomplish simple character conversion tasks. For example, you can convert a grephelp.txt file to full-text uppercase through tr:
The code is as follows:
> cat grephelp.txt | tr'[: lower:]'[: upper:]'
In short, tr's job is to convert the characters in the first collection to the corresponding characters in the second collection. Common character sets are as follows:
[: alnum:]: alphanumeric
[: alpha:]: letter
[: cntrl:]: control character
[: digit:]: number
[: graph:]: graphic character
[: lower:]: lowercase
[: print:]: printable characters
[: punct:]: punctuation
[: space:]: White space character
[: upper:]: capital letters
[: xdigit:]: hexadecimal number
The application scenario of the tr command is very limited, and if you want more flexible mode substitution, we also have sed (that is, stream editor, stream editor).
Replace all "find" text in the file with "search":
The code is as follows:
> sed "s/find/search/g" grephelp.txt
In this command, s means to perform a "replace operation", / find/search/ means to replace "find" with "search", and g means to replace all matches in a line. Sed prints the processing results to standard output by default. We can dump the processing results to a new file through redirection, or use the option-I to write the results directly back to the original file (risky, be careful):
The code is as follows:
> sed-I "s/find/search/g" grephelp.txt
Replace all the numbers n in the file with the form "--n muri -":
The code is as follows:
> sed-E "s / ([0-9] +) / -\ 1Murray grephelp.txt g"
The option-E indicates that the extended regular pattern (EREs) is used during processing, and\ 1 in the replacement command indicates the first capture packet that references the regular expression. Note that the-E option is valid only on Mac OS X and FreeBSD systems, while other Unix systems need to use another equivalent option-r.
The functions of sed are much more than these, space is limited, it is impossible to explain the use of sed in detail. If you want to learn more, please move on to this article.
Text deduplication
The code is as follows:
> cat-n sonnet116.txt
1 Let me not to the marriage of true minds
2 Admit impediments. Love is not love
3 Which alters when it alteration finds
4 Or bends with the remover to remove:
5 O, no! It is an ever- fix`ed mark
6 O, no! It is an ever- fix`ed mark
7 That looks on tempests and is never shaken
8 It is the star to every wand'ring bark
9 Whose worth's unknown, although his heighth be taken.
10 Love's not Time's fool, though rosy lips and cheeks
11 Love's not Time's fool, though rosy lips and cheeks
12 Love's not Time's fool, though rosy lips and cheeks
13 Within his bending sickle's compass come
14 Love alters not with his brief hours and weeks
15 But bears it out even to the edge of doom:
16 If this be error and upon me proved
17 I never writ, nor no man ever loved.
This is a sonnet by Shakespeare, except that lines 5 and 10 are repeated (and line 10 is repeated three times). How do I view duplicate lines in the text? The uniq command can help you.
The code is as follows:
> uniq-d sonnet116.txt
O, no! It is an ever- fix`ed mark
Love's not Time's fool, though rosy lips and cheeks
The option-d means that only duplicate lines are output. If you need to remove duplicates, use the uniq command without options:
The code is as follows:
> uniq sonnet116.txt
Let me not to the marriage of true minds
Admit impediments. Love is not love
Which alters when it alteration finds
Or bends with the remover to remove:
O, no! It is an ever- fix`ed mark
That looks on tempests and is never shaken
It is the star to every wand'ring bark
Whose worth's unknown, although his heighth be taken.
Love's not Time's fool, though rosy lips and cheeks
Within his bending sickle's compass come
Love alters not with his brief hours and weeks
But bears it out even to the edge of doom:
If this be error and upon me proved
I never writ, nor no man ever loved.
How many times do you want to see how many times each line is repeated? No problem, use the option-c:
The code is as follows:
Uniq-c sonnet116.txt
1 Let me not to the marriage of true minds
1 Admit impediments. Love is not love
1 Which alters when it alteration finds
1 Or bends with the remover to remove:
2 O, no! It is an ever- fix`ed mark
1 That looks on tempests and is never shaken
1 It is the star to every wand'ring bark
1 Whose worth's unknown, although his heighth be taken.
3 Love's not Time's fool, though rosy lips and cheeks
1 Within his bending sickle's compass come
1 Love alters not with his brief hours and weeks
1 But bears it out even to the edge of doom:
1 If this be error and upon me proved
1 I never writ, nor no man ever loved.
Text sorting
Suppose you have such a report file, the first column is the month, and the second column is the number of sales for that month:
The code is as follows:
> cat report.txt
March,19
June,50
February,17
May,18
August,16
April,31
May,18
July,26
January,24
August,16
The contents of this document are not only out of order, but also repeated. I want to sort alphabetically, with the following command:
The code is as follows:
> sort report.txt
April,31
August,16
August,16
February,17
January,24
July,26
June,50
March,19
May,18
May,18
The option-u (for unique) removes duplicate lines from the sort result:
The code is as follows:
> sort-u report.txt
April,31
August,16
February,17
January,24
July,26
June,50
March,19
May,18
Can you sort it by month? The option-M (for month-sort) can help us:
The code is as follows:
Sort-u-M report.txt
January,24
February,17
March,19
April,31
May,18
June,50
July,26
August,16
Sorting by the numbers in the second column is also simple:
The code is as follows:
> sort-u-tcm report.txt'- K2 report.txt
August,16
February,17
May,18
March,19
January,24
July,26
April,31
June,50
In the above example, the option-tweak minute 'means to split the column of the text with a comma as a separator;-K2 means to sort the second column.
Of course, it's not impossible to put the results in reverse order:
The code is as follows:
> sort-u-r-tweak report.txt'- K2 report.txt
June,50
April,31
July,26
January,24
March,19
May,18
February,17
August,16
Text statistics
The wc command is used to complete text statistics, and it can count the number of bytes (- c), characters (- m), words (- w) and lines (- l) in the file by using different options.
For example, look at the total number of words in the file grephelp.txt:
The code is as follows:
> wc-w grephelp.txt
1571 grephelp.txt
Check out the total number of non-repeating lines in the sonnet116.txt file (nonsense, sonnets are 14 lines, of course):
The code is as follows:
> uniq sonnet116.tx6 | wc-l
fourteen
You should also try Awk and Perl.
If the tools described above still don't satisfy you, you may need a weapon with more firepower. Try Awk and Perl.
The Awk is also an ancient artifact, which may be as old as the sed. Awk is created specifically for text processing, and its syntax and features are very suitable for manipulating text and generating reports. If you need to learn, please refer to this article, you will like it.
Perl has long been known as "writing only languages". In fact, as long as it is handled properly, you can write code that is clear and easy to read and understand with Perl. In my experience, more than 80% of the situations where Perl is used are related to text processing. Perl's built-in regular expression support is probably the best of all languages, coupled with compact syntax and convenient operators that make Perl the dominant player in text processing.
At this point, I believe that everyone on the "Linux command line Chinese text operation skills" have a deeper understanding, might as well to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.