Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The principle and usage of sort Command in linux

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "the principle and usage of sort command in linux". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn the principle and usage of the sort command in linux.

How 1 sort works

Sort compares each line of the file as a unit. The principle of comparison is to compare each other from the first character to the back, then compare them according to the ASCII code value in turn, and finally output them in ascending order.

[rocrocket@rocrocket programming] $cat seq.txt

Banana

Apple

Pear

Orange

[rocrocket@rocrocket programming] $sort seq.txt

Apple

Banana

Orange

Pear

-u option for 2 sort

Its simple function is to remove duplicate lines from the output line.

[rocrocket@rocrocket programming] $cat seq.txt

Banana

Apple

Pear

Orange

Pear

[rocrocket@rocrocket programming] $sort seq.txt

Apple

Banana

Orange

Pear

Pear

[rocrocket@rocrocket programming] $sort-u seq.txt

Apple

Banana

Orange

Pear

Pear was mercilessly deleted by the-u option due to repetition.

3-r option for sort

The default sorting method for sort is ascending order, and if you want to change it to descending order, just add a-r.

[rocrocket@rocrocket programming] $cat number.txt

one

three

five

two

four

[rocrocket@rocrocket programming] $sort number.txt

one

two

three

four

five

[rocrocket@rocrocket programming] $sort-r number.txt

five

four

three

two

one

4-o option for sort

Because sort outputs the results to standard output by default, you need to use redirection to write the results to a file, such as sort filename > newfile.

However, if you want to output the sort results to the original file, you can't use redirection.

[rocrocket@rocrocket programming] $sort-r number.txt > number.txt

[rocrocket@rocrocket programming] $cat number.txt

[rocrocket@rocrocket programming] $

Look, it emptied the number.

At this point, the-o option appears, and it successfully solves the problem, allowing you to safely write the results to the original file. This may also be the only advantage of-o proportion orientation.

[rocrocket@rocrocket programming] $cat number.txt

one

three

five

two

four

[rocrocket@rocrocket programming] $sort-r number.txt-o number.txt

[rocrocket@rocrocket programming] $cat number.txt

five

four

three

two

one

-n option for 5 sort

Have you ever encountered a situation that is 10 to 2 smaller? I've met it anyway. This happens because the sorter sorts the numbers by characters, and the sorter compares 1 and 2 first, obviously 1 is smaller, so put 10 before 2. This is also the usual style of sort.

If we want to change this, we need to use the-n option to tell sort, "sort by number"!

[rocrocket@rocrocket programming] $cat number.txt

one

ten

nineteen

eleven

two

five

[rocrocket@rocrocket programming] $sort number.txt

one

ten

eleven

nineteen

two

five

[rocrocket@rocrocket programming] $sort-n number.txt

one

two

five

ten

eleven

nineteen

6-t and-k options for sort

If there is a file that says something like this:

[rocrocket@rocrocket programming] $cat facebook.txt

Banana:30:5.5

Apple:10:2.5

Pear:90:2.3

Orange:20:3.4

This file has three columns, separated by colons, the first column represents the type of fruit, the second column represents the quantity of fruit, and the third column represents the price of fruit.

So I want to sort by the number of fruits, that is, by the second column, how can I use sort?

Fortunately, sort provides the-t option, and you can set the spacer later. (do you think of the-d option of cut and paste, resonate ~)

After you specify the spacer, you can use-k to specify the number of columns.

[rocrocket@rocrocket programming] $sort-n-k 2-t: facebook.txt

Apple:10:2.5

Orange:20:3.4

Banana:30:5.5

Pear:90:2.3

We use colons as spacers and sort numerically in ascending order for the second column, and the results are satisfactory.

7 other common sort options

-f converts lowercase letters to uppercase letters for comparison, that is, ignore uppercase and lowercase

-c will check whether the files are in order, if they are out of order, output the relevant information about the first disordered line, and finally return 1

-C will check whether the files are in order, and if they are out of order, they will not output the contents, only 1 will be returned.

-M will be sorted by month, for example, JAN is less than FEB, etc.

-b ignores all white space in front of each line, starting with the first visible character.

Sometimes when you learn the script, you will find that the sort command is followed by a bunch of things like-K1Power2, or-k1.2-k3.4, which is a little weird. Today, let's take care of it-the k option!

1 prepare the material

$cat facebook.txt

Google 110 5000

Baidu 100 5000

Guge 50 3000

Sohu 100 4500

The first domain is the company name, the second domain is the number of companies, and the third domain is the average salary of employees. (except for the name of the company, all other letters are written blindly ^ _ ^)

2 I want this file to be sorted alphabetically by the company, that is, by the first field: (this facebook.txt file has three fields)

$sort-t'- k 1 facebook.txt

Baidu 100 5000

Google 110 5000

Guge 50 3000

Sohu 100 4500

See, just use-k 1 to set it. (actually, it's not strict here, as you'll know later.)

I want facebook.txt to rank according to the number of companies.

$sort-n-t'- k 2 facebook.txt

Guge 50 3000

Baidu 100 5000

Sohu 100 4500

Google 110 5000

There's no need to explain. I'm sure you can understand.

However, there is a problem here, that is, baidu and sohu have the same number of companies, both have 100 people, what to do at this time? By default, the ascending sort starts from the first field, so baidu comes before sohu.

4 I want facebook.txt to sort by the number of people in the company, and those with the same number according to the rising order of the average salary of employees:

$sort-n-t''- k 2-k 3 facebook.txt

Guge 50 3000

Sohu 100 4500

Baidu 100 5000

Google 110 5000

Look, we added a-K2-K3 to solve the problem. Sort supports this setting, that is, to set the priority of the domain sort, first by the second field, and then by the third field if the same. (you can keep writing like this if you want, setting a lot of priorities.)

5 I want facebook.txt to sort according to the descending order of employees' wages, and if the number of employees is the same, sort by the ascending order of the number of employees: (this is a bit difficult)

$sort-n-t''- k 3r-k 2 facebook.txt

Baidu 100 5000

Google 110 5000

Sohu 100 4500

Guge 50 3000

There are some tips used here. If you take a closer look, you secretly add a lowercase letter r after-k 3. If you think about it, combined with our last article, can you get the answer? Reveal: the function of the r and-r options is the same, that is, the reverse order. Because sort is sorted in ascending order by default, you need to add r here to indicate that the third domain (average employee salary) is sorted in descending order. You can also add n here, which means that when sorting the field, it should be sorted by numerical value, for example:

$sort-t'- k 3nr-k 2n facebook.txt

Baidu 100 5000

Google 110 5000

Sohu 100 4500

Guge 50 3000

Look, we removed the first-n option and added it to each-k option.

Specific grammatical format of the 6-k option

If you want to go further, you have to have some theoretical knowledge. You need to know the syntax format of the-k option, as follows:

[FStart [.CStart]] [Modifier] [, [FEnd [.CEnd]] [Modifier]]

This syntax format can be divided into two parts, the Start part and the End part, by the comma (",").

First instill an idea for you, that is, "if you don't set the End section, then you think that End is set to the end of the line." This concept is very important, but often you don't take it seriously.

The Start part also consists of three parts, of which the Modifier part is the option section similar to n and r that we talked about earlier. Let's focus on FStart and C.Start in the Start section.

C.Start can also be omitted, which means starting at the beginning of the field. -k 2 and-k 3 in the previous example are examples of omitting C.Start.

FStart.CStart, where FStart is the domain used, and CStart represents the "sort first character" in the FStart field.

Similarly, in the End section, you can set FEnd.CEnd. If you omit .Cend, it ends at the end of the field, the last character of the field. Or, if you set CEnd to 0 (zero), it also means that it ends at the end of the domain.

7 on a whim, sort from the second letter of the company's English name:

$sort-t'- k 1.2 facebook.txt

Baidu 100 5000

Sohu 100 4500

Google 110 5000

Guge 50 3000

Look, we use-k 1.2, which means sorting the strings from the second character of the first field to the last character of the field. You will find that baidu tops the list because the second letter is a. Both sohu and google have the second character o, but the h of sohu precedes the o of google, so they are ranked second and third, respectively. Guge is only in fourth place.

8 on a whim, sort only for the second letter of the company's English name, if the same sort is in descending order according to the employee's salary:

$sort-t''- k 1.2 facebook.txt 1.2-k 3pm 3nr

Baidu 100 5000

Google 110 5000

Sohu 100 4500

Guge 50 3000

Since only the second letter is sorted, we use the representation of-k 1.2, which means that we sort only the second letter. (if you ask, "Why can't I use-k 1.2?" Of course not, because you omitted the End section, which means you will sort the strings from the second letter to the last character in the field. For the sorting of employees' salaries, we also use-k 3jol 3, which is the most accurate expression, which means that we "only" sort this field, because if you omit the following 3, it becomes that we "sort the content from the third domain to the last domain location".

9 what other options are available in the modifier section?

You can use b, d, f, I, n, or r.

You must already be familiar with n and r.

B indicates that the check-in blank symbol of the local domain is ignored.

D indicates that the fields are sorted in dictionary order (that is, only whitespace and letters are considered).

F indicates to sort the case ignored in this field.

I means to ignore "unprintable characters" and sort only for printable characters. (some ASCII characters are not printable, for example,\ an is an alarm,\ b is a backspace,\ n is a line feed,\ r is a carriage return, etc.)

10 think about examples of the combined use of-k and-u:

$cat facebook.txt

Google 110 5000

Baidu 100 5000

Guge 50 3000

Sohu 100 4500

This is the original facebook.txt file.

$sort-n-k 2 facebook.txt

Guge 50 3000

Baidu 100 5000

Sohu 100 4500

Google 110 5000

$sort-n-k 2-u facebook.txt

Guge 50 3000

Baidu 100 5000

Google 110 5000

When the value is sorted by the company's employee domain, and then-u is added, the sohu line is deleted! It turns out that-u only recognizes the domain set with-k, and if it is found to be the same, all subsequent identical rows will be deleted.

$sort-k 1-u facebook.txt

Baidu 100 5000

Google 110 5000

Guge 50 3000

Sohu 100 4500

$sort-k 1.1 cue 1.1-u facebook.txt

Baidu 100 5000

Google 110 5000

Sohu 100 4500

The same goes for this example, guge that begins with g is not spared.

$sort-n-k 2-k 3-u facebook.txt

Guge 50 3000

Sohu 100 4500

Baidu 100 5000

Google 110 5000

Hey! When the two-tier sort priority is set here, no lines are deleted using-u. It turns out that-u will weigh all the-k options, and only those that will be the same will be deleted, as long as there is a level of difference that will not be easily deleted:) (if you don't believe it, you can add your own line sina 4500 to try)

11 the weirdest sort:

$sort-n-k 2.2 mine3.1 facebook.txt

Guge 50 3000

Baidu 100 5000

Sohu 100 4500

Google 110 5000

Sort from the second character of the second field to the end of the first character of the third field.

In the first row, 0 3 is extracted, 0 05 is extracted in the second line, 00 4 is extracted in the third row, and 10 5 is extracted in the fourth line.

And because sort thinks 0 is less than 00, 000 is less than 0000... .

So 0 3 must be the first one. 105 must be the last one. But why is 00 5 in front of 00 4? You can do your own experiment and think about it. )

The answer is revealed: originally, "the cross-domain setting is an illusion". Sort will only compare the second character of the second field to the last character of the second field, but will not include the beginning character of the third field. When it is found that 00 and 00 are the same, sort will automatically compare the first domain. Of course baidu is in front of sohu. It can be proved by an example:

$sort-n-k 2.2 facebook.txt 3.1-k 1pm 1r

Guge 50 3000

Sohu 100 4500

Baidu 100 5000

Google 110 5000

12 sometimes you see the symbols + 1-2 after the sort command. What is this?

With regard to this syntax, the latest sort explains it as follows:

On older systems, `sort' supports an obsolete origin-zero syntax `+ POS1 [- POS2] 'for specifying sort keys. POSIX 1003.1-2001 (* note Standards conformance::) does not allow this; use `- k' instead.

It turns out that this ancient way of representation has been eliminated, and it is reasonable to despise scripts that use this method of representation in the future.

To prevent the existence of ancient scripts, let's talk about this representation here, with the plus sign for the Start part and the minus sign for the End part. The most important point is that this method counts from 0, and the first domain mentioned earlier is represented here as the zeroth domain. The previous second character is represented here as the first character. Understand? )

At this point, I believe you have a deeper understanding of "the principle and usage of the sort command in linux". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report