What are the programming skills of Linux text processing commands 07/01 Update SLTechnology News&Howtos

What are the programming skills of Linux text processing commands

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Linux text processing command programming skills, I believe that many inexperienced people do not know what to do, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.

Qualified programmers are good at using tools. As the saying goes, a gentleman's nature is not different, and he is good at things. Reasonable use of Linux command line tools can improve our work efficiency.

On the basis of the previous article, several useful Linux command line tools and usage scenarios are introduced.

A few more words, tools can provide efficiency, but there is a certain learning curve and learning costs. When many students want to use it temporarily, they may fall into an awkward position that they will not use it, and then go to the Internet to search and study, which will eventually take longer, and it is better to use stupid methods to deal with it, which is one of the reasons why many students do not use these tools.

And what's more difficult is to change the original habits of thinking. There are more than 20 lines of data in a file to generate SQL, which seems to be processed manually for 2 minutes. I can't remember the shortcut keys, and I can move my mouse very quickly. But when the number of lines becomes larger or the complexity increases, the disadvantages of these methods will appear, forcing us to use the right means.

So why not use faster and more complex scenarios in the first place?

With two scenarios as the introduction, the join, sort, uniq commands and sed editors are introduced.

Merge associated lines in two files

To briefly talk about the scenario, there are two files, both of which are lines in a fixed format, representing a row of data in the database, one file is user-related data with three columns of user_id, username and gender, and the other is order-related data with four lines of order_id, price and user_id,time. Now we need to merge the two files by line according to user_id, that is, the same lines of user_id to form a new line. This is shown in the following figure.

The contents of the above two files are as follows:

/ / order.txt user_id is the third column o1 1u1 2011-9 o2 2u2 2011-10o3 3u3 2011-10o4 4u1 2011-12 / / user.txt user_id is the first column U1 tom male U2 jack male U3 nacy female

We are going to use join and find that the specific command format has been forgotten. At this time, you can either search the Internet or ask man.

You can learn about the function description and parameter introduction of join through man. Generally speaking, you can see under the DESCRIPTION column.

From the man document above, it is clear that the join command uses the equality join operation to merge specific files and output them to the standard output stream. Join filed is the column used to compare when merging files, and the default is the first column of the two files. You can use-1 and-2 to define the columns to compare for the first file and the second file, respectively.

$join-1 3-2 1 order.txt user.txt # specifies the third column of order.txt and the first column of user.txt to compare join U1 O1 1 1 2011-9 tom male U2 o2 2 2011-10 jack male U3 o3 3 2011-10 nacy female

You will find that one line is missing from the output, and the order.txt is obviously four lines. Why? Let's take a closer look at the man documents and find out.

The rows of the two merged files must first be sorted by the comparison column, otherwise some rows may be missing. User.txt has already sorted by its first column, so we just need to use the sort command to sort order.txt by its third column.

The sort command sorts the first column of the text file in the order of ASCII codes by default and outputs the results to standard output. The-k parameter specifies the column by which it is sorted.

$sort-k 3 order.txt # in numerical order uses-n if it is in reverse order-r o4 4u1 2011-12 o1 1u1 2011-9 o2 2u2 2011-10 o3 3 u3 2011-10

We combine the above two commands, now store the results of sort in sorted_order.txt, and then join, we can get the final result.

$sort-k 3 order.txt > sorted_order.txt $join-1 3-2 1 sorted_order.txt user.txt U1 o4 2011-12 tom male U1o1 1 2011-9 tom male U2o2 2 2011-10 jack male U3o3 2011-10 nacy

In addition, the default column delimiters for the above commands are\ t and spaces, and you can use the-t parameter to specify characters as delimiters.

Through the combination of the above commands, we have completed the operation of merging two files according to the same column, which also reflects the KISS idea of Linux, each tool only does one small thing.

Or based on the above scenario, what if you suddenly need to count the number of orders purchased by each user in order.txt, and then sort them from large to small?

We can combine sort and uniq. The uniq command is typically used to check and delete repeated lines in a file, and we can use it to calculate the number of user occurrences in order.txt.

$sort-k 3 order.txt | uniq-f 3-c #-f means to count according to the third column 1 o4 4u1 2011-12 1 o1 1u1 2011-9 2o2 2u2 2011-10

Delete hyperlinks in Markdown files

Another scenario I encountered when editing an article was when there were a lot of hyperlinks in the markdown format, that is, the [description] (link) format. I wanted to remove all the hyperlinks, that is, to remove square brackets, parentheses and parentheses. Because there is a lot of code in the document, which contains a lot of parentheses, you must have an exact hyperlink format before replacing it.

Here, we can use the sed command. Sed's full name is the stream editor Stream Editor, which allows you to edit text programmatically. For those who want to learn it in an all-round way, you can read the SED Concise tutorial or the sed Manual. We will only introduce the most basic features and show you the possibility of using it. In general, you need to know about regular expressions when using sed. A 30-minute tutorial for getting started with regular expressions is recommended.

The easiest way to use sed is to replace text. For example, if we want to replace all the u in the above order.txt with user, we can use the following command.

$sed's user UBG 'order.txt # u is the replacement word user is the replacement word o1 1 user1 2011-9 o2 2 user2 2011-10 o3 3 user3 2011-10 O4 4 user1 2011-12

Sed can also easily implement the multi-line cursor editing function often used by sublime or vscode. For example, add text before each line of order.txt.

$sed's / ^ / # / g 'order.txt # ^ represents the beginning of a line in a regular expression, so it means adding the # character # o1 1u1 2011-9 # o2u2 2011-10 # o3 u3 2011-10 # o4 u1 2011-12 to the beginning of the line

Next, let's look directly at how to convert the hyperlink format to plain text.

$echo "[link] (http://http://remcarpediem.net/)" | sed-E" s /\ [(. *)]\ (. *\) /\ 1ram g "link

First, the regular expression that identifies the [description] (link) format is\ [. *\]\ (. *\), where\ [and\ (represent the [and (symbols) of the matching text, respectively. . Represents any single character, * indicates that a character appears 0 or more times, and a combination of the two. * indicates the occurrence of any character 0 or more times.

To sum up, the meaning of the above regular expression is to appear a [, then 0 or more arbitrary characters, in the presence of a], in the presence of a (, in the presence of 0 or more arbitrary characters, and finally a).

Second, we want to replace the entire hyperlink text with the description text in [description], so we need to identify the content in square brackets first, then we need to enclose it separately in () to represent a subexpression, that is,\ [(. *)\]\ (. *\).

Finally, in the mode of sed's s _ pact _ g, s represents the replacement mode, g means to match all the characters from the beginning of each line to the end of the line, plus g, there are multiple links on a line to match, and only the first one can be matched without addition. \ 1 represents the first subexpression, which is the description in square brackets.

After reading the above, have you mastered the programming skills of Linux text processing commands? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.