In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains the "Linux large file redirection and pipeline efficiency which is higher", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "Linux large file redirection and pipeline which is more efficient" bar!
# Command 1, pipe import shell > cat huge_dump.sql | mysql-uroot;# command 2, redirect import shell > mysql-uroot
< huge_dump.sql; 大家先看一下上面二个命令,假如huge_dump.sql文件很大,然后猜测一下哪种导入方式效率会更高一些? 这个问题挺有意思的,我的第一反应是:没比较过,应该是一样的,一个是cat负责打开文件,一个是bash 这种场景在MySQL运维操作里面应该比较多,所以就花了点时间做了个比较和原理上的分析: 我们先构造场景: 首先准备一个程序b.out来模拟mysql对数据的消耗: int main(int argc, char *argv[]) while(fread(buf, sizeof(buf), 1, stdin) >0); return 0;} $gcc-o b.out b.c $ls |. / b.out
Then write a systemtap script to make it easy to observe the behavior of the program.
$cat test.stp function should_log () {return (execname () = = "cat" | | execname () = = "b.out" | | execname () = = "bash");} probe syscall.open, syscall.close, syscall.read, syscall.write, syscall.pipe, syscall.fork, syscall.execve, syscall.dup, syscall.wait4 {if (! should_log () next Printf ("% s->% s\ n", thread_indent (0), probefunc ());} probe kernel.function ("pipe_read"), kernel.function ("pipe_readv"), kernel.function ("pipe_write"), kernel.function ("pipe_writev") {if (! should_log ()) next Printf ("% s->% s: file ino% d\ n", thread_indent (0), probefunc (), _ _ file_ino ($filp);} probe begin {println (": ~")}
This script focuses on the order of several system calls and the reading and writing of pipe, and then prepares a 419m large file huge_dump.sql, which can be easily put down in memory on our machines with tens of gigabytes of memory:
$sudo dd if=/dev/urandom of=huge_dump.sql bs=4096 count=102400 102400 million 0 records in 102400 million 0 records out 419430400 bytes (419 MB) copied, 63.9886 seconds, 6.6 MB/s
Because this file is written in bufferio, its contents are cache in pagecahce memory and will not involve disk.
All right, now that the scene is complete, let's compare the speed in two cases, the first kind of pipeline:
# first pipeline $time (cat huge_dump.sql |. / b.out) real 0m0.596s user 0m0.001s sys 0m0.919s # second redirect $time (. / b.out huge_dump.sql $sudo stap test.stp: ~ 0 bash (26570):-> sys_read 0 bash (26570):-> sys_read 0 bash (26570):-> sys_write 0 bash ( 26570):-> sys_read 0 bash (26570):-> sys_write 0 bash (26570):-> sys_close 0 bash (26570):-> sys_pipe 0 bash (26570):-> sys_pipe 0 bash (26570):-> do_fork 0 bash (26570):-> sys_close 0 bash (26570):-> sys_close 0 bash (26570):-> do_ Fork 0 bash (13775):-> sys_close 0 bash (13775):-> sys_read 0 bash (13775):-> pipe_read: file ino 20906911 0 bash (13775):-> pipe_readv: file ino 20906911 0 bash (13776):-> sys_close 0 bash (13776):-> do_ Execve 0 bash (26570):-> sys_close 0 bash (13775):-> sys_close 0 bash (26570):-> sys_wait4 0 bash (13775):-> sys_close 0 bash (13775):-> sys_close 0 b.out (13776):-> sys_close 0b. Out (13776):-> sys_close 0 bash (13775):-> do_execve 0 b.out (13776):-> sys_open 0 b.out (13776):-> sys_close 0 b.out (13776):-> sys_open 0 b.out (13776):-> sys_read 0 b.out (13776):-> sys_close 0 cat (13775):-> sys_close 0 cat (13775):-> sys_close 0 b.out (13776):-> sys_read 0 b.out (13776):-> pipe_read: file ino 20906910 0 b.out (13776):-> pipe_readv: file ino 20906910 0 cat (13775):-> sys_open 0 cat (13775):-> sys_close 0 cat (13775):-> sys_open 0 cat (13775): -> sys_read 0 cat (13775):-> sys_close 0 cat (13775):-> sys_open 0 cat (13775):-> sys_close 0 cat (13775):-> sys_open 0 cat (13775):-> sys_read 0 cat (13775):-> sys_write 0 cat (13775):-> pipe_write: file ino 209069100 cat (13775):-> pipe_ Writev: file ino 20906910 0 cat (13775):-> sys_read 0 b.out (13776):-> sys_read 0 b.out (13776):-> pipe_read: file ino 20906910 0 b.out (13776):-> pipe_readv: file ino 209069100 cat (13775):-> sys_close 0 cat (13775):-> sys_close 0 bash (26570):-> sys_wait4 0 Bash (26570):-> sys_close 0 bash (26570):-> sys_wait4 0 bash (26570):-> sys_write
Stap is collecting data, and we are running the pipeline in another window:
$cat huge_dump.sql |. / b.out
We can see from systemtap's log that:
Bash fork has 2 processes.
Execve then runs the cat and b.out processes, respectively, which communicate with pipe.
The data is read out from huge_dump.sql by cat, written to pipe, and then read out by b.out from pipe.
Then take a look at the redirection of Command 2:
$. / b.out
< huge_dump.sql stap输出: 0 bash(26570): ->Sys_read 0 bash (26570):-> sys_read 0 bash (26570):-> sys_write 0 bash (26570):-> sys_read 0 bash (26570):-> sys_write 0 bash (26570):-> sys_close 0 bash (26570):-> sys_pipe 0 bash (26570):-> do_fork 0 bash (28926):-> sys_close 0 bash ( 28926):-> sys_read 0 bash (28926):-> pipe_read: file ino 20920902 0 bash (28926):-> pipe_readv: file ino 20920902 0 bash (26570):-> sys_close 0 bash (26570):-> sys_close 0 bash (26570):-> sys_wait4 0 bash (28926):-> sys_close 0 bash (28926):-> sys_open 0 bash ( 28926):-> sys_close 0 bash (28926):-> do_execve 0 b.out (28926):-> sys_close 0 b.out (28926):-> sys_close 0 b.out (28926):-> sys_open 0 b.out (28926):-> sys_close 0 b.out (28926):-> sys_open 0 b.out (28926):-> sys_read 0 b.out (28926):-> sys_close 0 b.out (28926):-> sys_read 0 b.out (28926):-> sys_read 0 bash (26570):-> sys_wait4 0 bash (26570):-> sys_write 0 bash (26570):-> sys_read
Bash fork has a process that opens the data file.
Then get the file handle to the 0 handle, and the process execve runs b.out.
Then b.out reads the data directly.
It is now clear why there is a three-fold difference in speed between the two scenarios:
Command 1, pipeline mode: read twice, write once, plus a process context switch.
Command 2, redirect mode: read only once.
Conclusion: the efficiency of redirecting large files under Linux is higher.
Thank you for your reading, the above is the "Linux large file redirection and pipeline efficiency which is higher" content, after the study of this article, I believe you on the Linux large file redirection and pipeline efficiency which higher this problem has a deeper understanding, the specific use of the situation also needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.