In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "the principle and usage of sort command in linux". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn the principle and usage of the sort command in linux.
How 1 sort works
Sort compares each line of the file as a unit. The principle of comparison is to compare each other from the first character to the back, then compare them according to the ASCII code value in turn, and finally output them in ascending order.
[rocrocket@rocrocket programming] $cat seq.txt
Banana
Apple
Pear
Orange
[rocrocket@rocrocket programming] $sort seq.txt
Apple
Banana
Orange
Pear
-u option for 2 sort
Its simple function is to remove duplicate lines from the output line.
[rocrocket@rocrocket programming] $cat seq.txt
Banana
Apple
Pear
Orange
Pear
[rocrocket@rocrocket programming] $sort seq.txt
Apple
Banana
Orange
Pear
Pear
[rocrocket@rocrocket programming] $sort-u seq.txt
Apple
Banana
Orange
Pear
Pear was mercilessly deleted by the-u option due to repetition.
3-r option for sort
The default sorting method for sort is ascending order, and if you want to change it to descending order, just add a-r.
[rocrocket@rocrocket programming] $cat number.txt
one
three
five
two
four
[rocrocket@rocrocket programming] $sort number.txt
one
two
three
four
five
[rocrocket@rocrocket programming] $sort-r number.txt
five
four
three
two
one
4-o option for sort
Because sort outputs the results to standard output by default, you need to use redirection to write the results to a file, such as sort filename > newfile.
However, if you want to output the sort results to the original file, you can't use redirection.
[rocrocket@rocrocket programming] $sort-r number.txt > number.txt
[rocrocket@rocrocket programming] $cat number.txt
[rocrocket@rocrocket programming] $
Look, it emptied the number.
At this point, the-o option appears, and it successfully solves the problem, allowing you to safely write the results to the original file. This may also be the only advantage of-o proportion orientation.
[rocrocket@rocrocket programming] $cat number.txt
one
three
five
two
four
[rocrocket@rocrocket programming] $sort-r number.txt-o number.txt
[rocrocket@rocrocket programming] $cat number.txt
five
four
three
two
one
-n option for 5 sort
Have you ever encountered a situation that is 10 to 2 smaller? I've met it anyway. This happens because the sorter sorts the numbers by characters, and the sorter compares 1 and 2 first, obviously 1 is smaller, so put 10 before 2. This is also the usual style of sort.
If we want to change this, we need to use the-n option to tell sort, "sort by number"!
[rocrocket@rocrocket programming] $cat number.txt
one
ten
nineteen
eleven
two
five
[rocrocket@rocrocket programming] $sort number.txt
one
ten
eleven
nineteen
two
five
[rocrocket@rocrocket programming] $sort-n number.txt
one
two
five
ten
eleven
nineteen
6-t and-k options for sort
If there is a file that says something like this:
[rocrocket@rocrocket programming] $cat facebook.txt
Banana:30:5.5
Apple:10:2.5
Pear:90:2.3
Orange:20:3.4
This file has three columns, separated by colons, the first column represents the type of fruit, the second column represents the quantity of fruit, and the third column represents the price of fruit.
So I want to sort by the number of fruits, that is, by the second column, how can I use sort?
Fortunately, sort provides the-t option, and you can set the spacer later. (do you think of the-d option of cut and paste, resonate ~)
After you specify the spacer, you can use-k to specify the number of columns.
[rocrocket@rocrocket programming] $sort-n-k 2-t: facebook.txt
Apple:10:2.5
Orange:20:3.4
Banana:30:5.5
Pear:90:2.3
We use colons as spacers and sort numerically in ascending order for the second column, and the results are satisfactory.
7 other common sort options
-f converts lowercase letters to uppercase letters for comparison, that is, ignore uppercase and lowercase
-c will check whether the files are in order, if they are out of order, output the relevant information about the first disordered line, and finally return 1
-C will check whether the files are in order, and if they are out of order, they will not output the contents, only 1 will be returned.
-M will be sorted by month, for example, JAN is less than FEB, etc.
-b ignores all white space in front of each line, starting with the first visible character.
Sometimes when you learn the script, you will find that the sort command is followed by a bunch of things like-K1Power2, or-k1.2-k3.4, which is a little weird. Today, let's take care of it-the k option!
1 prepare the material
$cat facebook.txt
Google 110 5000
Baidu 100 5000
Guge 50 3000
Sohu 100 4500
The first domain is the company name, the second domain is the number of companies, and the third domain is the average salary of employees. (except for the name of the company, all other letters are written blindly ^ _ ^)
2 I want this file to be sorted alphabetically by the company, that is, by the first field: (this facebook.txt file has three fields)
$sort-t'- k 1 facebook.txt
Baidu 100 5000
Google 110 5000
Guge 50 3000
Sohu 100 4500
See, just use-k 1 to set it. (actually, it's not strict here, as you'll know later.)
I want facebook.txt to rank according to the number of companies.
$sort-n-t'- k 2 facebook.txt
Guge 50 3000
Baidu 100 5000
Sohu 100 4500
Google 110 5000
There's no need to explain. I'm sure you can understand.
However, there is a problem here, that is, baidu and sohu have the same number of companies, both have 100 people, what to do at this time? By default, the ascending sort starts from the first field, so baidu comes before sohu.
4 I want facebook.txt to sort by the number of people in the company, and those with the same number according to the rising order of the average salary of employees:
$sort-n-t''- k 2-k 3 facebook.txt
Guge 50 3000
Sohu 100 4500
Baidu 100 5000
Google 110 5000
Look, we added a-K2-K3 to solve the problem. Sort supports this setting, that is, to set the priority of the domain sort, first by the second field, and then by the third field if the same. (you can keep writing like this if you want, setting a lot of priorities.)
5 I want facebook.txt to sort according to the descending order of employees' wages, and if the number of employees is the same, sort by the ascending order of the number of employees: (this is a bit difficult)
$sort-n-t''- k 3r-k 2 facebook.txt
Baidu 100 5000
Google 110 5000
Sohu 100 4500
Guge 50 3000
There are some tips used here. If you take a closer look, you secretly add a lowercase letter r after-k 3. If you think about it, combined with our last article, can you get the answer? Reveal: the function of the r and-r options is the same, that is, the reverse order. Because sort is sorted in ascending order by default, you need to add r here to indicate that the third domain (average employee salary) is sorted in descending order. You can also add n here, which means that when sorting the field, it should be sorted by numerical value, for example:
$sort-t'- k 3nr-k 2n facebook.txt
Baidu 100 5000
Google 110 5000
Sohu 100 4500
Guge 50 3000
Look, we removed the first-n option and added it to each-k option.
Specific grammatical format of the 6-k option
If you want to go further, you have to have some theoretical knowledge. You need to know the syntax format of the-k option, as follows:
[FStart [.CStart]] [Modifier] [, [FEnd [.CEnd]] [Modifier]]
This syntax format can be divided into two parts, the Start part and the End part, by the comma (",").
First instill an idea for you, that is, "if you don't set the End section, then you think that End is set to the end of the line." This concept is very important, but often you don't take it seriously.
The Start part also consists of three parts, of which the Modifier part is the option section similar to n and r that we talked about earlier. Let's focus on FStart and C.Start in the Start section.
C.Start can also be omitted, which means starting at the beginning of the field. -k 2 and-k 3 in the previous example are examples of omitting C.Start.
FStart.CStart, where FStart is the domain used, and CStart represents the "sort first character" in the FStart field.
Similarly, in the End section, you can set FEnd.CEnd. If you omit .Cend, it ends at the end of the field, the last character of the field. Or, if you set CEnd to 0 (zero), it also means that it ends at the end of the domain.
7 on a whim, sort from the second letter of the company's English name:
$sort-t'- k 1.2 facebook.txt
Baidu 100 5000
Sohu 100 4500
Google 110 5000
Guge 50 3000
Look, we use-k 1.2, which means sorting the strings from the second character of the first field to the last character of the field. You will find that baidu tops the list because the second letter is a. Both sohu and google have the second character o, but the h of sohu precedes the o of google, so they are ranked second and third, respectively. Guge is only in fourth place.
8 on a whim, sort only for the second letter of the company's English name, if the same sort is in descending order according to the employee's salary:
$sort-t''- k 1.2 facebook.txt 1.2-k 3pm 3nr
Baidu 100 5000
Google 110 5000
Sohu 100 4500
Guge 50 3000
Since only the second letter is sorted, we use the representation of-k 1.2, which means that we sort only the second letter. (if you ask, "Why can't I use-k 1.2?" Of course not, because you omitted the End section, which means you will sort the strings from the second letter to the last character in the field. For the sorting of employees' salaries, we also use-k 3jol 3, which is the most accurate expression, which means that we "only" sort this field, because if you omit the following 3, it becomes that we "sort the content from the third domain to the last domain location".
9 what other options are available in the modifier section?
You can use b, d, f, I, n, or r.
You must already be familiar with n and r.
B indicates that the check-in blank symbol of the local domain is ignored.
D indicates that the fields are sorted in dictionary order (that is, only whitespace and letters are considered).
F indicates to sort the case ignored in this field.
I means to ignore "unprintable characters" and sort only for printable characters. (some ASCII characters are not printable, for example,\ an is an alarm,\ b is a backspace,\ n is a line feed,\ r is a carriage return, etc.)
10 think about examples of the combined use of-k and-u:
$cat facebook.txt
Google 110 5000
Baidu 100 5000
Guge 50 3000
Sohu 100 4500
This is the original facebook.txt file.
$sort-n-k 2 facebook.txt
Guge 50 3000
Baidu 100 5000
Sohu 100 4500
Google 110 5000
$sort-n-k 2-u facebook.txt
Guge 50 3000
Baidu 100 5000
Google 110 5000
When the value is sorted by the company's employee domain, and then-u is added, the sohu line is deleted! It turns out that-u only recognizes the domain set with-k, and if it is found to be the same, all subsequent identical rows will be deleted.
$sort-k 1-u facebook.txt
Baidu 100 5000
Google 110 5000
Guge 50 3000
Sohu 100 4500
$sort-k 1.1 cue 1.1-u facebook.txt
Baidu 100 5000
Google 110 5000
Sohu 100 4500
The same goes for this example, guge that begins with g is not spared.
$sort-n-k 2-k 3-u facebook.txt
Guge 50 3000
Sohu 100 4500
Baidu 100 5000
Google 110 5000
Hey! When the two-tier sort priority is set here, no lines are deleted using-u. It turns out that-u will weigh all the-k options, and only those that will be the same will be deleted, as long as there is a level of difference that will not be easily deleted:) (if you don't believe it, you can add your own line sina 4500 to try)
11 the weirdest sort:
$sort-n-k 2.2 mine3.1 facebook.txt
Guge 50 3000
Baidu 100 5000
Sohu 100 4500
Google 110 5000
Sort from the second character of the second field to the end of the first character of the third field.
In the first row, 0 3 is extracted, 0 05 is extracted in the second line, 00 4 is extracted in the third row, and 10 5 is extracted in the fourth line.
And because sort thinks 0 is less than 00, 000 is less than 0000... .
So 0 3 must be the first one. 105 must be the last one. But why is 00 5 in front of 00 4? You can do your own experiment and think about it. )
The answer is revealed: originally, "the cross-domain setting is an illusion". Sort will only compare the second character of the second field to the last character of the second field, but will not include the beginning character of the third field. When it is found that 00 and 00 are the same, sort will automatically compare the first domain. Of course baidu is in front of sohu. It can be proved by an example:
$sort-n-k 2.2 facebook.txt 3.1-k 1pm 1r
Guge 50 3000
Sohu 100 4500
Baidu 100 5000
Google 110 5000
12 sometimes you see the symbols + 1-2 after the sort command. What is this?
With regard to this syntax, the latest sort explains it as follows:
On older systems, `sort' supports an obsolete origin-zero syntax `+ POS1 [- POS2] 'for specifying sort keys. POSIX 1003.1-2001 (* note Standards conformance::) does not allow this; use `- k' instead.
It turns out that this ancient way of representation has been eliminated, and it is reasonable to despise scripts that use this method of representation in the future.
To prevent the existence of ancient scripts, let's talk about this representation here, with the plus sign for the Start part and the minus sign for the End part. The most important point is that this method counts from 0, and the first domain mentioned earlier is represented here as the zeroth domain. The previous second character is represented here as the first character. Understand? )
At this point, I believe you have a deeper understanding of "the principle and usage of the sort command in linux". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.