Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the principle of rsync algorithm?

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what is the principle of rsync algorithm". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what is the principle of rsync algorithm"?

The rsync command is a remote data synchronization tool that allows you to quickly synchronize files between multiple hosts through LAN/WAN. Rsync uses the so-called "rsync algorithm" to synchronize files between local and remote hosts, which transfers only different parts of the two files. For specific use, please refer to https://man.linuxde.net/rsync.

It makes me curious that I can synchronize different parts of two files, so how do I do it?

In fact, a simple idea is to put the two files together and compare them with each other. the problem here is that one file is at the remote end and the other is locally, so if you follow this way, don't you want to send a file over and overwrite it directly? The idea of sync is that if a part of the file changes, then synchronize this piece of content. As for how the identification of this piece of content is different from the original, it needs to be synchronized. Simply what we think of is the data summary. If the two md5 are not the same, it will change the need for synchronization. So let's take a look at how rsync handles it.

First of all, we will divide the fileDst file into several small chunks on average. For example, we divide the file into 1024 bytes, and then calculate two identifiers for each block. Rsync calculates it in two ways:

One is called rolling checksum, which is a weak checksum,32 bit checksum, which uses the adler-32 algorithm invented by Mark Adler.

One is a strong checksum,128 bit, which used to use md4 and now uses the md5 hash algorithm.

In addition, the synchronization destination will send a calculated list of files to the synchronization source file, this list includes three things, rolling checksum,md5 checksume, file block number.

After the source machine gets this list, it compares the same calculation with the target calculation value, so that it knows which file blocks have changed. What immediately comes to mind here is that if a character is added to the source file, such as the middle or the beginning, then the calculated value of each block will not be changed, or will it not all be synchronized? Because the way of "moving" is used at the back of the comparison.

After the synchronization source gets the checksum array of fileDst, it will store this data in a hash table and use rolling checksum as hash to achieve the search performance of O (1) time complexity. Simply speaking, it can be understood as map, but it does not know how this hash table is handled in the event of a collision.

Then start the calculation here in the source file. If the weak checksum is the same as the strong checksum, then this is the same file, and if one of them is inconsistent, then it is a different part. Therefore, the algorithm will move 1 byte after it stops, and then calculate the file block to do checksum, but the contents of the previous byte need to be synchronized to the target machine. There will be a temporary file on the target machine, and the original target file will be replaced when the match is over. If the file changes are relatively large, which is more efficient to do so many calculations on the file or to transfer the file directly?

There is the calculation of checksum, weak 32byte, strong 128byte, so it is still possible to repeat it, but this probability is very low 2 to the 160th power, this way, the error probability is lower than the case of direct transmission coverage!

At this point, I believe you have a deeper understanding of "what is the principle of the rsync algorithm?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report