How to use GNU Parallel to improve the efficiency of Linux Command Line execution 07/02 Update SLTechnology News&Howtos

How to use GNU Parallel to improve the efficiency of Linux Command Line execution

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "how to use GNU Parallel to improve the efficiency of Linux command line execution". The content is simple and clear. I hope it can help you solve your doubts. Let me lead you to study and learn this article "how to use GNU Parallel to improve the efficiency of Linux command line execution".

Install GNU Parallel

GNU Parallel is probably not pre-installed on your Linux or BSD host, and you can install it from the software source. Take Fedora as an example:

$sudo dnf install parallel

For NetBSD:

# pkg_add parallel

If all methods are unsuccessful, please refer to the project home page.

From serial to parallel

As the name suggests, the power of Parallel is that tasks are executed in parallel; many of us still run tasks serially.

When you execute a command on multiple objects, you actually create a task queue. Some objects can be processed by the command, and the remaining objects need to wait until the command processes them. This approach is inefficient. As long as there is enough data, a task queue will always be formed, but instead of using only one task queue, why not use multiple smaller task queues?

Suppose you have a picture directory and you want to convert the pictures in the directory from JEEG format to PNG format. There are many ways to accomplish this task. You can manually use GIMP to open each picture and output it to a new format, but this is basically the worst choice, time-consuming and laborious.

There is a beautiful and concise variation of the above approach, which is based on shell:

$convert 001.jpeg 001.png $convert 002.jpeg 002.png$ convert 003.jpeg 003.png. Slightly.

For beginners, this is a big change, and it seems to be a big improvement. No longer need a graphical interface and constant mouse clicks, but it is still laborious.

Further improvements:

$for i in * jpeg; do convert $I $i.png; done

At the very least, this step sets the task for execution so that you can save time to do more valuable things. But the problem is that this is still a serial operation; after one image is converted, the next one in the queue is converted, and so on.

Use Parallel:

$find. -name "* jpeg" | parallel-I%-- max-args 1 convert% .png

This is a combination of two commands: the find command, which collects the objects that need to be manipulated, and the parallel command, which is used to sort objects and ensure that each object is processed as needed.

Find. -name "* jpeg" finds all files in the current directory that end in jpeg.

Parallel calls GNU Parallel.

-I% creates a placeholder% that represents what find passes to Parallel. If you don't use placeholders, you need to manually write a command for each result of the find command, which is exactly what you want to avoid.

-- max-args 1 gives a rate limit for Parallel to get new objects from the queue. Considering that Parallel runs a command that requires only one file input, the rate limit is set to 1. If you need to execute more complex commands and require two file inputs (for example, cat 001.txt 002.txt > new.txt), you need to set the rate limit to 2.

Convert%. Png is the command you want Parallel to execute.

The execution effect of the combined command is as follows: the find command collects all relevant file information and passes it to parallel, which (using the current parameters) starts a task and (no need to wait for the task to complete) immediately gets the next parameter in the parameter line (LCTT: each line of pipeline output corresponds to a parameter of parallel, and all parameters constitute the parameter line); as long as your host is not paralyzed, Parallel will continue to do this. When the old task is completed, Parallel assigns a new task to it until all the data is processed. It takes about 10 minutes to complete the task without using Parallel, and only 3 to 5 minutes after using it.

Multiple inputs

As long as you are familiar with find and xargs (collectively known as the GNU Lookup tool, or findutils), the find command is a * * Parallel data provider. It provides a flexible interface that most Linux users are used to and easy to learn even for beginners.

The find command is straightforward: you provide find with information about the search path and part of the file to be found. You can use wildcards to do a fuzzy search; in the following example, the asterisk matches any character, so find locates (filename) all files that end with the character searchterm:

$find / path/to/directory-name "* searchterm"

By default, find returns search results row by line, with one line for each result:

$find ~ / graphics-name "* jpg" / home/seth/graphics/001.jpg/home/seth/graphics/cat.jpg/home/seth/graphics/penguin.jpg/home/seth/graphics/IMG_0135.jpg

When you pipe the results of find to parallel, the file path on each line is treated as an argument to the parallel command. On the other hand, if you need to use commands to handle multiple parameters, you can change the way queue data is passed to parallel.

The following is a less practical example, which will be modified later to make it more meaningful. If you have installed GNU Parallel, you can follow this example.

Suppose you have four files, listed as one file per line, as follows:

$echo ada > ada; echo lovelace > lovelace$ echo richard > richard; echo stallman > stallman$ ls-1adalovelacerichardstallman

You need to merge the two files into a third file, which contains the contents of the first two files. In this case, Parallel needs to access two files, and using the-I% variable does not meet the expectations of this example.

Parallel reads 1 queue object by default:

$ls-1 | parallel echoadalovelacerichardstallman

Now let Parallel use 2 queue objects per task:

$ls-1 | parallel-- max-args=2 echoada lovelacerichard stallman

Now we see that the rows have been merged; specifically, the two query results of ls-1 are passed to Parallel at the same time. The parameters passed to Parallel involve two files required by the task, but there is currently only one valid parameter: (for the two tasks) "ada lovelace" and "richard stallman". What you really need is two separate parameters for each task.

Fortunately, Parallel itself provides the parsing capabilities required above. If you set-- max-args to 2, then {1} and {2} represent the * * and the second part of the passed parameter, respectively:

$ls-1 | parallel-- max-args=2 cat {1} {2} ">" {1} _ {2} .person "

In the above command, the value of the variable {1} is ada or richard (depending on the task you select), and the value of the variable {2} is lovelace or stallman. By using the redirect symbol (placed in quotation marks to prevent it from being recognized by Bash so that Parallel can use it), the contents of the file are redirected to the new files ada_lovelace.person and richard_stallman.person, respectively.

$ls-1adaada_lovelace.personlovelacerichardrichard_stallman.personstallman $cat ada_*personada lovelace$ cat ri*personrichard stallman

If you work with a large number of log files with hundreds of MB sizes all day, the parallel processing of text will help you a lot; otherwise, the above example is just an example for getting started.

However, this processing method is also very helpful for many operations other than text processing. The following is a real case from the film industry, where video files and (corresponding) audio files in a directory need to be merged.

$ls-112 percent LSIS estabishment building conversation man. Avi12 percent percent sound.flac14 percent butlerUsue mixed.flac14 percent MSstores butler.avi.flac14 percent. A little.

Using the same method, you can merge files in parallel using the following simple command:

$ls-1 | parallel-- max-args=2 ffmpeg-I {1}-I {2}-vcodec copy-acodec copy {1} .mkv simple and rude way

The above fancy input and output processing is not necessarily to everyone's taste. If you want to be more direct, you can throw a bunch of commands to Parallel and do something else.

First, you need to create a text file with one command on each line:

$cat jobs2runbzip2 oldstuff.taroggenc music.flacopusenc ambiance.wavconvert bigfile.tiff small.jpegffmepg-I foo.avi-v foo.mp4xsltproc b 12000k foo.mp4xsltproc-- output build/tmp.fo style/dm.xsl src/tmp.xmlbzip2 archive.tar

Next, pass the file to Parallel:

$parallel-- jobs 6 < jobs2run

Now all the corresponding tasks in the file are being performed by Parallel. If the number of tasks exceeds the allowed number, Parallel creates and maintains a queue until all the tasks are complete. If the number of tasks exceeds the number specified by LCTT or the default value, Parallel creates and maintains a queue.

The above is all the contents of the article "how to use GNU Parallel to improve the efficiency of Linux command line execution". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.