In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces which pits are encountered in Linux practice using redirection and pipe characters, the content is very detailed, interested friends can refer to, I hope it can be helpful to you.
I like the Linux system very much, especially some of the beautiful designs of Linux, such as the ability to decompose some complex problems into a number of small problems, through pipe characters and redirection mechanisms to flexibly use off-the-shelf tools to solve, writing shell scripts is very efficient.
> and > > the pit of the redirector
Let's start with the first question, what happens when you execute the following command?
$cat file.txt > file.txt
Reading and writing to the same file feels like nothing's going to happen, right?
In fact, the result of running the above command is to empty the contents of the file.txt file.
PS: some Linux distributions may report errors directly, and you can execute cat
< file.txt >File.txt bypassed the test.
As mentioned earlier in the Linux process and file descriptor, the program itself does not need to care where its standard input / output points to. It is the location where shell modifies the standard input / output of the program through pipe characters and redirection symbols.
So when you execute the command cat file.txt > file.txt, shell will open file.txt first, because the redirection symbol is >, so the contents of the file will be cleared, and then shell sets the standard output of the cat command to file.txt, and then the cat command starts execution.
That is, the following process:
1. Shell opens the file.txt and clears its contents.
2. Shell points the standard output of the cat command to the file.txt file.
3. Shell executes the cat command and reads an empty file.
4. The cat command writes an empty string to standard output (file.txt file).
So, the end result is that file.txt becomes an empty file.
We know that > will empty the target file, and > will append the content at the end of the target file, so what if you change the redirector > to > >?
There is only one line in the $echo hello world > file.txt # file. The command $cat file.txt > > file.txt # loops.
One line is written first in file.txt, and the expected result after executing cat file.txt > > file.txt should be two lines.
Unfortunately, the result of the run is not as expected, but the hello world will continue to be written to the file.txt in an endless loop, and the file will soon become so large that you can only stop the command with Control+C.
This is interesting, why is there an endless cycle? In fact, after a little analysis, we can think of the reason:
First of all, recall the behavior of the cat command. If you only execute the cat command, the keyboard input will be read from the command line, and each time you press enter, the cat command will echo the input, that is, the cat command reads the data line by line and then outputs the data.
Then the execution of the cat file.txt > > file.txt command is as follows:
1. Open file.txt and prepare to append the content at the end of the file.
2. Point the standard output of the cat command to the file.txt file.
3. The cat command reads a line from file.txt and writes it to standard output (appended to the file.txt file).
4. Since a row of data has just been written, the cat command will repeat step 3 when it finds that there is something to read in the file.txt.
The above process is like traversing the list and adding elements to the list all the time, which leads to an endless loop of our commands.
> Redirectors and | Pipeline characters match
We often encounter the need to intercept the first XX line of the file and delete the rest.
In Linux, the head command can intercept the first few lines of a file:
$cat file.txt # file.txt contains five lines 1 2 3 4 5$ head-n 2 file.txt # head command reads the first two lines 1 2$ cat file.txt | head-n 2 # head can also read standard input 1 2
If we want to keep the first two lines of the file and delete the rest, we might use the following command:
$head-n 2 file.txt > file.txt
But this makes the mistake mentioned above, in the end, the file.txt will be emptied and will not be able to meet our needs.
So if we write the command like this, can we avoid the pit?
$cat file.txt | head-n 2 > file.txt
The conclusion is no, the contents of the document will still be emptied.
What? Is there a leak in the pipe that leaves out all the data?
The previous Linux process and file descriptor also talked about the implementation principle of pipe characters, which is essentially connecting the standard input and output of two commands and making the standard output of the previous command as the standard input of the next command.
However, if you think that writing commands like this will get the desired results, it may be because you think that commands connected by pipe characters are executed serially, which is a common mistake. In fact, multiple commands connected by pipe characters are executed in parallel.
You might think that shell would first execute the cat file.txt command, read everything in file.txt normally, and then pipe it to the head-N2 > file.txt command.
Although the contents of the file.txt are emptied at this time, the head does not read the data from the file, but reads the data from the pipe, so it should be able to write two lines of data correctly to the file.txt.
But in fact, the above understanding is wrong, and shell executes pipe connection commands in parallel, such as the following command:
$sleep 5 | sleep 5
Shell starts both sleep processes at the same time, so the result is sleep for 5 seconds instead of 10 seconds.
This is a bit counterintuitive, such as this common command:
$cat filename | grep 'pattern'
The intuition seems to be to execute the cat command to read all the contents of the filename at once, and then pass it to the grep command for search.
But in fact, the cat and grep commands are executed at the same time, and the expected results are achieved because grep 'pattern' blocks waiting for standard input, while cat writes data to grep's standard input through the Linux pipeline.
Executing the following command gives a visual sense that cat and grep are executing at the same time, and grep is processing the data we enter with the keyboard in real time:
$cat | grep 'pattern'
Having said so much, let's go back to the first question:
$cat file.txt | head-n 2 > file.txt
The cat command and head are executed in parallel, and the result of the execution is uncertain as to who is the first and who is later.
If the head command is executed before the cat, then the file.txt will be emptied first and the cat will not read anything; conversely, if the cat reads the contents of the file first, you can get the expected results.
However, through my experiment (repeating this concurrency for a week), I found that the probability of file.txt being emptied is much higher than that of expected results, and it is not clear why it should have something to do with the logic of Linux kernel implementation processes and pipes.
Solution
After talking so much about the characteristics of pipe symbols and redirectors, how can we avoid the empty hole of this file?
The most reliable way is not to read and write the same file at the same time, but to make a transfer through a temporary file.
For example, to keep only the first two lines of the file.txt file, you can write the code like this:
# write data to a temporary file first, and then overwrite the original file $cat file.txt | head-n 2 > temp.txt & & mv temp.txt file.txt
This is the simplest, most reliable and foolproof method.
If you think this command is too long, you can also install the moreutils package through package management tools such as apt/brew/yum, and there will be an extra sponge command. Use it like this:
# first transfer the data to sponge, and then write the data to the original file $cat file.txt by sponge | head-n 2 | sponge file.txt
The word sponge means sponge, which is quite vivid. It will first "absorb" the input data, and then write it to file.txt. The core idea is similar to when we use temporary files. This "sponge" is like a temporary file, which can avoid the problem of opening the same file for reading and writing at the same time.
These are some of the potholes of redirection and plumbing symbols. I hope they can help you.
On the practice of Linux redirection and pipe characters encountered in the pit which are shared here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.