What is the efficiency of Linux large file redirection and pipelines? 07/16 Update SLTechnology News&Howtos

What is the efficiency of Linux large file redirection and pipelines?

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains the "Linux large file redirection and pipeline efficiency is how", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in-depth, together to study and learn "Linux large file redirection and pipeline efficiency is how" it!

Guide: first take a look at the two commands. If the huge_dump.sql file is very large, then guess which import method will be more efficient? # Command 1, pipe import shell > cat huge_dump.sql | mysql-uroot;# command 2, redirect import shell > mysql-uroot

< huge_dump.sql; 大家先看一下上面二个命令，假如huge_dump.sql文件很大，然后猜测一下哪种导入方式效率会更高一些? 这个问题挺有意思的，我的第一反应是：没比较过，应该是一样的，一个是cat负责打开文件，一个是bash 这种场景在MySQL运维操作里面应该比较多，所以就花了点时间做了个比较和原理上的分析：我们先构造场景：首先准备一个程序b.out来模拟mysql对数据的消耗： int main(int argc, char *argv[]) while(fread(buf, sizeof(buf), 1, stdin) >

0); return 0;} $gcc-o b.out b.c $ls |. / b.out

Then write a systemtap script to make it easy to observe the behavior of the program.

$cat test.stp function should_log () {return (execname () = = "cat" | | execname () = = "b.out" | | execname () = = "bash");} probe syscall.open, syscall.close, syscall.read, syscall.write, syscall.pipe, syscall.fork, syscall.execve, syscall.dup, syscall.wait4 {if (! should_log () next Printf ("% s->% s\ n", thread_indent (0), probefunc ());} probe kernel.function ("pipe_read"), kernel.function ("pipe_readv"), kernel.function ("pipe_write"), kernel.function ("pipe_writev") {if (! should_log ()) next Printf ("% s->% s: file ino% d\ n", thread_indent (0), probefunc (), _ _ file_ino ($filp);} probe begin {println (": ~")}

This script focuses on the order of several system calls and the reading and writing of pipe, and then prepares a 419m large file huge_dump.sql, which can be easily put down in memory on our machines with tens of gigabytes of memory:

$sudo dd if=/dev/urandom of=huge_dump.sql bs=4096 count=102400 102400 million 0 records in 102400 million 0 records out 419430400 bytes (419 MB) copied, 63.9886 seconds, 6.6 MB/s

Because this file is written in bufferio, its contents are cache in pagecahce memory and will not involve disk.

All right, now that the scene is complete, let's compare the speed in two cases, the first kind of pipeline:

# the first pipeline method $time (cat huge_dump.sql |. / b.out) real 0m0.596s user 0m0.001s sys 0m0.919s

From the number of execution time, we can see that the speed is about 3 times different, and the second kind is obviously much faster.

Isn't that a little weird? All right, let's analyze it from the above, or continue to use the data:

This time prepare a very small data file, easy to observe and then run stap in a window

$echo hello > huge_dump.sql $sudo stap test.stp: ~ 0 bash (26570):-> sys_read 0 bash (26570):-> sys_read 0 bash (26570):-> sys_write 0 bash (26570):-> sys_read 0 bash (26570):-> sys_write 0 bash (26570):-> sys_close 0 bash (26570):-> sys_pipe 0 bash ( 26570):-> sys_pipe 0 bash (26570):-> do_fork 0 bash (26570):-> sys_close 0 bash (26570):-> sys_close 0 bash (26570):-> do_fork 0 bash (13775):-> sys_close 0 bash (13775):-> sys_read 0 bash (26570):-> pipe_read: file ino 20906911 0 bash (13775): -> pipe_readv: file ino 20906911 0 bash (13776):-> sys_close 0 bash (13776):-> do_execve 0 bash (26570):-> sys_close 0 bash (13775):-> sys_ Close 0 bash (26570):-> sys_wait4 0 bash (13775):-> sys_close 0 bash (13775):-> sys_close 0 b.out (13776):-> sys_close 0 b.out (13776):-> sys_close 0 bash (13775):-> do_execve 0 b.out (13776):-> sys_open 0 b.out (13776):-> sys_ Close 0 b.out (13776):-> sys_open 0 b.out (13776):-> sys_read 0 b.out (13776):-> sys_close 0 cat (13775):-> sys_close 0 cat (13775):-> sys_close 0 b.out (13776):-> sys_read 0 b.out (13776):-> pipe_read: file ino 209069100 b.out ( 13776):-> pipe_readv: file ino 0 0 cat (13775):-> sys_open 0 cat (13775):-> sys_close 0 cat (13775):-> sys_open 0 cat (13775):-> sys_read 0 cat (13776):-> sys_close 0 cat (13 775):-> sys_open 0 cat (13 775):-> sys_close 0 cat (13 775): -> sys_open 0 cat (13775):-> sys_read 0 cat (13775):-> sys_write 0 cat (13775):-> pipe_write: file ino 20906910 0 cat (13775):-> pipe_writev: file ino 20906910 0 cat (13775):-> sys_read 0 b.out (13776):-> sys_read 0 b.out (13776):-> pipe_read: file ino 20906910 0 b.out (13776):-> pipe_readv: file ino 20906910 0 cat (13775):-> sys_close 0 cat (13775):-> sys_close 0 bash (26570):-> sys_wait4 0 bash (26570):-> sys_close 0 bash (26570):-> sys_wait4 0 bash (26570):-> sys_write

Stap is collecting data, and we are running the pipeline in another window:

$cat huge_dump.sql |. / b.out

We can see from systemtap's log that:

Bash fork has 2 processes.

Execve then runs the cat and b.out processes, respectively, which communicate with pipe.

The data is read out from huge_dump.sql by cat, written to pipe, and then read out by b.out from pipe.

Then take a look at the redirection of Command 2:

$. / b.out

< huge_dump.sql stap输出： 0 bash(26570): ->

Sys_read 0 bash (26570):-> sys_read 0 bash (26570):-> sys_write 0 bash (26570):-> sys_read 0 bash (26570):-> sys_write 0 bash (26570):-> sys_close 0 bash (26570):-> sys_pipe 0 bash (26570):-> do_fork 0 bash (28926):-> sys_close 0 bash ( 28926):-> sys_read 0 bash (28926):-> pipe_read: file ino 20920902 0 bash (28926):-> pipe_readv: file ino 20920902 0 bash (26570):-> sys_close 0 bash (26570):-> sys_close 0 bash (26570):-> sys_wait4 0 bash (28926):-> sys_close 0 bash (28926):-> sys_open 0 bash ( 28926):-> sys_close 0 bash (28926):-> do_execve 0 b.out (28926):-> sys_close 0 b.out (28926):-> sys_close 0 b.out (28926):-> sys_open 0 b.out (28926):-> sys_close 0 b.out (28926):-> sys_open 0 b.out (28926):-> sys_read 0 b.out (28926):-> sys_close 0 b.out (28926):-> sys_read 0 b.out (28926):-> sys_read 0 bash (26570):-> sys_wait4 0 bash (26570):-> sys_write 0 bash (26570):-> sys_read

Bash fork has a process that opens the data file.

Then get the file handle to the 0 handle, and the process execve runs b.out.

Then b.out reads the data directly.

It is now clear why there is a three-fold difference in speed between the two scenarios:

Command 1, pipeline mode: read twice, write once, plus a process context switch.

Command 2, redirect mode: read only once.

Thank you for your reading, the above is the "Linux large file redirection and pipeline efficiency is how" the content, after the study of this article, I believe you on the Linux large file redirection and pipeline efficiency is how this problem has a deeper understanding, the specific use of the situation also needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.