Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the method of processing and transferring sparse files in Linux

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what is the method of processing and transferring sparse files in Linux". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "what is the method of sparse file processing and transfer in Linux"!

0. What is a sparse file

When a user applies for a large piece of storage space, because the data is not written at the beginning (all empty), the file system will not allocate the actual storage space in order to save storage resources and improve resource utilization. Only when the data is actually written, the operating system will really allocate space bit by bit, such as a 64KB. So the file looks very large, but the footprint is very small, and the actual footprint is only related to the amount of data filled in by the user. The file looks like a big box, but it may not contain much and the hole is very large, so it is called a Sparse file. Sparse file is an advanced feature of the Linux file system, which can achieve disk overload usage (overload). Its most classic application is to create virtual hard disks and database snapshots for virtual machines. For example, we use qemu-img to create a raw file with the size of 20GB (note that the qcow2 format is not a sparse file):

Fgp@node1:~$ qemu-img create-f raw test.raw 20G Formatting 'test.raw', fmt=raw size=21474836480 fgp@node1:~$ qemu-img info test.raw image: test.raw file format: raw virtual size: 20g (21474836480 bytes) disk size: 0

Above, we use qemu-img to create a 20g image file. Qemu-img info shows the amount of space allocated to us by virtual size, while disk size actually takes up space and does not take up any disk space at first.

Note: qemu-img create-f raw is equivalent to `truncate-s 20G test.raw'.

Of course, there will be problems, such as the system generated a bunch of sparse files, if the file system is full, these files will fail to write, in order to avoid this situation, you need to control the number of sparse files.

1. How to determine whether it is a sparse file

In addition to the above image files may be sparse files, other file types may also be sparse files, how to determine whether it is a sparse file? The easiest way is to use the ls command and the du command to check the size, respectively, and if the size is not the same, it is a sparse file. We can use the dd command to quickly generate a sparse file:

Dd if=/dev/zero of=sparse_file bs=1M seek=1024 count=0

The above command starts at 1024 * 1m to write the file (equivalent to an empty 1GB space in the middle), writes / dev/zero, and actually writes 0 blocks (count=0), so no data is actually written. We use ls-lh to look at its size:

~ $ls-lh sparse_file-rw-rw-r-- 1 fgp fgp 1.0G May 26 15:47 sparse_file

You can see that the file is displayed as 1G.

Let's use the du-h command to see how much disk space it takes up:

~ $du-h sparse_file 0 sparse_file

We found that the actual disk space is 0.

We can also directly use the-s parameter of ls to see the actual size of the file:

~ $ls-slh sparse_file 0-rw-rw-r-- 1 fgp fgp 1.0G May 26 15:47 sparse_file

* * is listed as the actual disk space occupied, and the sixth is the file size (virtual size).

In addition, you can adjust the file size at will using the truncate command (if the file does not exist, it will be created automatically), for example:

~ $truncate-- size 1T sparse_file ~ $du-h sparse_file 0 sparse_file ~ $ls-lh sparse_file-rw-rw-r-- 1 fgp fgp 1.0T May 26 16:09 sparse_file

Above, we resize the sparse_file file to 1TB, which is actually adding an extended part (hole) reads as zero bytes to the back, so it doesn't take up actual disk space. Of course, you can also reduce the file size, but if it takes up less space than the file data, the data will be intercepted, so some of the data will be lost.

Truncate-s 500m sparse_file ~ $ls-lh sparse_file-rw-rw-r-- 1 fgp fgp 500m May 26 16:12 sparse_file

Above we reduced the file to 500MB.

2. Sparse file processing

There are also some problems when dealing with sparse files, for example, we use sed to process a sparse file.

Fgp@node1:~/tmp$ echo "Hello World" > test.raw fgp@node1:~/tmp$ truncate-s 1G test.raw fgp@node1:~/tmp$ ls-slh total 68K 4.0K-rw-rw-r-- 1 fgp fgp 1.0G May 28 14:52 test.raw fgp@node1:~/tmp$ sed-I's Charlotte HELLOGUP test.raw fgp@node1:~/tmp$ ls-slh total 1.1G-rw-rw-r-- 1 fgp fgp 1.0G May 28 14:53 test.raw

Above we use truncate to create a sparse file, and then change Hello to HELLO through the sed command. We expect to be able to retain the sparse feature of the file, but in fact we find that only one line of data in the file has been modified and the hole in the file is filled up, taking up 1G of disk space instantly. Isn't it baffling that a file that is only 4K in size becomes 1G after using the sed command?

For example, we use the tar command to archive the file:

Fgp@node1:~/tmp$ qemu-img create-f raw test.raw 1G Formatting 'test.raw' Fmt=raw size=1073741824 fgp@node1:~/tmp$ time tar-cf test.tar test.raw real 0m2.145s user 0m0.012s sys 0m1.640s fgp@node1:~/tmp$ time tar-cJf test.tar.xz test.raw real 1m0.692s user 0m59.060s sys 0m1.048s fgp@node1:~/tmp$ ls-lsh total 1.1G 0-rw-r--r-- 1 fgp fgp 1.0G May 28 15:37 test.raw 1.1G-rw-rw-r-- 1 fgp fgp 1 .1G May 28 15:37 test.tar 156K-rw-rw-r-- 1 fgp fgp 153K May 28 15:39 test.tar.xz

Above we created a 1G sparse file, which was found to be a non-sparse file when archived directly with tar, taking up 1G of disk space. When using xz compression, although the problem of storage space is solved, it also brings the problem of compression time overhead (it takes 1 minute to compress).

Next, we will introduce the familiar classic command, the cp,cp command, which everyone knows. It is known to be used to copy files locally. Fortunately (why, since not all commands support this feature), the cp command automatically detects whether files are sparse files, empty data is not copied, and can retain the sparse nature of copies of sparce files:

Fgp@node1:~$ cp sparse_file sparse_file.copy fgp@node1:~$ ls-slh sparse_file* 0-rw-rw-r-- 1 fgp fgp 2.0G May 26 16:39 sparse_file 0-rw-rw-r-- 1 fgp fgp 2.0G May 26 16:39 sparse_file.copy

Let's take a look at the command scp,scp, similar to the cp command, for remotely copying files (transferring files remotely):

Fgp@node1:~$ scp sparse_file localhost:~/sparse_file.copy sparse_file 100% 2048MB 97.5MB/s 00:21 fgp@node1:~$ ls-slh sparse_file* 0-rw-rw-r-- 1 fgp fgp 2.0G May 26 16:39 sparse_file 2.1G-rw-rw-r-- 1 fgp fgp 2.0G May 26 16:42 sparse_file.copy

We found that scp does not recognize the sparse file, and when transferring a sparse file, it will automatically fill the void and send the entire file content.

In fact, the cp command has an optimized parameter for sparse file copy-sparse=WHEN, in which the legal values of WHEN are auto, always, never, and the default is auto, which can automatically identify whether a sparse file is a file. If set to never, it automatically fills up the data and copies the entire file:

Fgp@node1:~$ cp-- sparse=never sparse_file sparse_file.copy.2 fgp@node1:~$ ls-lhs sparse_file* 0-rw-rw-r-- 1 fgp fgp 2.0G May 26 16:39 sparse_file 2.1G-rw-rw-r-- 1 fgp fgp 2.0G May 26 16:42 sparse_file.copy 2.1G-rw-rw-r-- 1 fgp fgp 2.0G May 26 16:50 sparse_file.copy.2

It can be seen that sparse_file.copy.2 fills the hole, which is equivalent to converting sparse files into non-sparse files.

If specified as always, cp attempts to convert files to sparse files, reducing disk footprint:

Fgp@node1:~$ cp-- sparse=always sparse_file.copy sparse_file.copy.3 fgp@node1:~$ ls-lsh sparse_file* 0-rw-rw-r-- 1 fgp fgp 2.0G May 26 16:39 sparse_file 2.1G-rw-rw-r-- 1 fgp fgp 2.0G May 26 16:42 sparse_file.copy 2.1G-rw-rw-r-- 1 fgp fgp 2.0G May 26 16:50 sparse_file.copy.2 0-rw-rw -rmurf-1 fgp fgp 2.0G May 26 16:52 sparse_file.copy.3

From the results, we found that we converted the non-sparse file sparse_file.copy into the sparse file sparse_file.copy.3.

Note: cp command cool techs, cp to achieve the mutual conversion of sparse files!

In fact, in addition to the cp command, our above tar command also supports the-sparse parameter:

Fgp@node1:~/tmp$ time tar-cSf test.tar test.raw real 0m0.002s user 0m0.000s sys 0m0.000s fgp@node1:~/tmp$ time tar-cSJf test.tar.xz test.raw real 0m0.011s user 0m0.000s sys 0m0.008s fgp@node1:~/tmp$ ls-slh total 16K 0-rw-r--r-- 1 fgp fgp 1.0G May 28 15:37 test.raw 12K-rw-rw-r-- 1 fgp fgp 10K May 28 15:42 test .tar 4.0K-rw-rw-r-- 1 fgp fgp 184 May 28 15:43 test.tar.xz

Comparing the previous results, we find that the sparse file is handled well using the-S (- sparse) parameter of tar.

In addition, cpio also supports the same parameters, but unfortunately the scp command does not, so we use scp to remotely transfer a large number of sparse files is extremely inefficient and wastes a lot of network space. For example, we often use qemu-img to create a raw file of 40GB, and then need to copy and mirror it to other machines, although the file may only take up about 1GB disk space, but using scp needs to transfer 40GB space, and remote need to reserve 40GB disk space. Is there an efficient way to transfer sparse files? In fact, unfortunately, it doesn't seem to be, but there is a better way, please see the next section.

3. A relatively efficient method for transferring sparse files

We said earlier that scp does not support sparse file processing, but the rsync command supports sparse file processing:

Fgp@node1:~$ rsync-av-- sparse-- progress sparse_file localhost:~/sparse_file.copy fgp@localhost's password: sending incremental file list sparse_file 2147483648 100 74.67MB/s 0:00:27 (xfr#1 To-chk=0/1) sent 2148008037 bytes received 35 bytes 66092556.06 bytes/sec total size is 2147483648 speedup is 1.00 fgp@node1:~$ ls-lhs sparse_file* 0-rw-rw-r-- 1 fgp fgp 2.0G May 26 16:39 sparse_file 0-rw-rw-r-- 1 fgp fgp 2.0G May 26 16:39 sparse_file.copy

Unfortunately, although the target file retains its sparse characteristics and saves the storage space of the target host, it does not save the network transmission bandwidth and still transmits 2GB data, and rsync can not filter out the transmission of empty data.

It is worth mentioning that rsync has a parameter-inplace, which can detect whether the source file and target file are modified blocks. Only the modified blocks are passed during transfer. Of course, this parameter is of no use when transferring files * times. Unfortunately, the-sparse parameter and the-inplace parameter cannot be used at the same time. The common practice is to use the-sparse parameter when transferring files for * times, and then use the-inplace parameter if the file is modified and need to be synchronized remotely, which will only transfer updated blocks based on the original file. You can first create a sparse file with the same name using the truncate command on the remote destination machine, and then pass it with the-inplace parameter.

Of course, if we are transferring an image file, we can convert the raw format to qcow2 format locally through qemu-img and then transfer it:

Fgp@node1:~/tmp$ ls-lsh total 00-rw-rw-r-- 1 fgp fgp 10G May 28 15:00 test.raw fgp@node1:~/tmp$ qemu-img convert-f raw-O qcow2 test.raw test.qcow2 fgp@node1:~/tmp$ ls-lsh total 196K-rw-r--r-- 1 fgp fgp 193K May 28 15:12 test.qcow2 0-rw-rw-r-- 1 fgp fgp 10G May 28 15:00 test.raw

When converted to qcow2 format, it is no longer an sparse file, so the above problem does not exist. From the above output, we find that the file is only 196K, so the transmission volume is greatly reduced.

At this point, I believe you have a deeper understanding of "what is the method of sparse file processing and transfer in Linux". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report