In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
The content of this article mainly focuses on how to redirect large Linux files and compare the efficiency of pipelines. the content of the article is clear and clear, which is very suitable for beginners to learn and is worth reading. Interested friends can follow the editor to read together. I hope you can get something through this article!
Getting started with Linux # Command 1, Import shell > cat huge_dump.sql for pipes | mysql-uroot
# Command 2, redirect import shell > mysql-uroot
We first take a look at the above two commands, if the huge_dump.sql file is very large, and then guess which import method will be more efficient?
This is an interesting question. My first reaction is:
No comparison, it should be the same, one is cat is responsible for opening the file, the other is bash
This kind of scenario should be quite common in MySQL operation and maintenance operations, so it took some time to make a comparison and principle analysis:
Let's first construct the scene:
First prepare a program b.out to simulate the data consumption of mysql:
Int main (int argc, char * argv []) while (fread (buf, sizeof (buf), 1, stdin) > 0); return 0;} $gcc-o b.out b.c $ls |. / b.out
Then write a systemtap script to make it easy to observe the behavior of the program.
$cat test.stpfunction should_log () {return (execname () = = "cat" | | execname () = = "b.out" | | execname () = = "bash");} probe syscall.open, syscall.close, syscall.read, syscall.write, syscall.pipe, syscall.fork, syscall.execve, syscall.dup, syscall.wait4 {if (! should_log () next Printf ("% s->% s\ n", thread_indent (0), probefunc ());} probe kernel.function ("pipe_read"), kernel.function ("pipe_readv"), kernel.function ("pipe_write"), kernel.function ("pipe_writev") {if (! should_log ()) next Printf ("% s->% s: file ino% d\ n", thread_indent (0), probefunc (), _ _ file_ino ($filp);} probe begin {println (": ~")}
This script focuses on the order of several system calls and the reading and writing of pipe, and then prepares a 419m large file huge_dump.sql, which can be easily put down in memory on our machines with tens of gigabytes of memory:
$sudo dd if=/dev/urandom of=huge_dump.sql bs=4096 count=102400102400+0 records in102400+0 records out419430400 bytes (419 MB) copied, 63.9886 seconds, 6.6 MB/s
Because this file is written in bufferio, its contents are cache in pagecahce memory and will not involve disk.
All right, now that the scene is complete, let's compare the speed in two cases, the first kind of pipeline:
# first pipeline $time (cat huge_dump.sql |. / b.out) real 0m0.596suser 0m0.001ssys 0m0.919s# second redirection $time (. / b.out)
From the number of execution time, we can see that the speed is about 3 times different, and the second kind is obviously much faster.
Isn't that a little weird? All right, let's analyze it from the above, or continue to use the data:
This time prepare a very small data file, easy to observe and then run stap in a window
Echo hello > huge_dump.sql$ sudo stap test.stp:~ 0 bash (26570):-> sys_read 0 bash (26570):-> sys_read 0 bash (26570):-> sys_write 0 bash (26570):-> sys_read 0 bash (26570):-> sys_write 0 bash (26570):-> sys_close 0 bash (26570):-> sys_pipe 0 bash (26570) ):-> sys_pipe 0 bash (26570):-> do_fork 0 bash (26570):-> sys_close 0 bash (26570):-> sys_close 0 bash (26570):-> do_fork 0 bash (13775):-> sys_close 0 bash (13775):-> sys_read 0 bash (13775):-> pipe_read: file ino 20906911 0 bash (13775):- > pipe_readv: file ino 20906911 0 bash (13776):-> sys_close 0 bash (13776):-> do_execve 0 bash (26570):-> sys_close 0 bash (13775):-> sys_close 0 bash (26570):-> sys_wait4 0 bash (13775):-> sys_close 0 bash (13775):-> sys_close 0 b.out (13776):-> sys_close 0 b.out (13776):-> sys_close 0 bash (13775):-> do_execve 0 b.out (13776):-> sys_open 0 b.out (13776):-> sys_close 0 b.out (13776):-> sys_open 0 b.out (13776):-> sys_read 0 b.out (13776):-> sys_close 0 cat (13775):-> sys_close 0 cat (13775):-> sys_close 0 b.out (13776):-> sys_read 0 b.out (13776):-> pipe_read: file ino 209069100 b.out (13776) ):-> pipe_readv: file ino 20906910 0 cat (13775):-> sys_open 0 cat (13775):-> sys_close 0 cat (13775):-> sys_open 0 cat (13775):-> sys_read 0 cat (13775):-> sys_close 0 cat (13775):-> sys_open 0 cat (13775):-> sys_close 0 cat (13775):- > sys_open 0 cat (13775):-> sys_read 0 cat (13775):-> sys_write 0 cat (13775):-> pipe_write: file ino 20906910 0 cat (13775):-> pipe_writev: file ino 20906910 0 cat (13775):-> sys_read 0 b.out (13776):-> sys_read 0 b.out (13776):-> pipe_read: file ino 20906910 0 b.out (13776):-> pipe_readv: file ino 20906910 0 cat (13775):-> sys_close 0 cat (13775):-> sys_close 0 bash (26570):-> sys_wait4 0 bash (26570):-> sys_close 0 bash (26570):-> sys_wait4 0 bash (26570):-> sys_write
Stap is collecting data, and we are running the pipeline in another window:
$cat huge_dump.sql |. / b.out
We can see from systemtap's log that:
Bash fork has 2 processes. Execve then runs the cat and b.out processes, respectively, which communicate with pipe. The data is read out from huge_dump.sql by cat, written to pipe, and then read out by b.out from pipe.
Then take a look at the redirection of Command 2:
$. / b.out sys_read 0 bash (26570):-> sys_read 0 bash (26570):-> sys_write 0 bash (26570):-> sys_read 0 bash (26570):-> sys_write 0 bash (26570):-> sys_close 0 bash (26570):-> sys_pipe 0 bash (26570):-> do_fork 0 bash (28926):-> sys_ Close 0 bash (28926):-> sys_read 0 bash (28926):-> pipe_read: file ino 20920902 0 bash (28926):-> pipe_readv: file ino 20920902 0 bash (26570):-> sys_close 0 bash (26570):-> sys_close 0 bash (26570):-> sys_wait4 0 bash (28926):-> sys_close 0 bash (28926):-> sys_ Open 0 bash (28926):-> sys_close 0 bash (28926):-> do_execve 0 b.out (28926):-> sys_close 0 b.out (28926):-> sys_close 0 b.out (28926):-> sys_open 0 b.out (28926):-> sys_close 0 b.out (28926):-> sys_open 0 b.out (28926): -> sys_read 0 b.out (28926):-> sys_close 0 b.out (28926):-> sys_read 0 b.out (28926):-> sys_read 0 bash (26570):-> sys_wait4 0 bash (26570):-> sys_write 0 bash (26570):-> sys_read
Bash fork has a process that opens the data file. Then get the file handle to the 0 handle, and the process execve runs b.out. Then b.out reads the data directly.
It is now clear why there is a three-fold difference in speed between the two scenarios:
Command 1, pipeline mode: read twice, write once, plus a process context switch.
Command 2, redirect mode: read only once.
What is Linux system Linux is a free-to-use and free-spread UNIX-like operating system, is a POSIX-based multi-user, multi-task, multi-threaded and multi-CPU operating system, using Linux can run major Unix tools, applications and network protocols.
Thank you for your reading. I believe you have some understanding of "how to redirect large Linux files and compare the efficiency of pipes". Go ahead and practice it. If you want to know more about it, you can follow the website! The editor will continue to bring you better articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.