In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
In this issue, the editor will bring you about how to achieve Baidu network disk instant biography. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.
A reader asked me on Wechat: how is the second transfer function of Baidu network disk realized?
I have actually thought about this problem. I guess the front end calculates the hash value of a file (such as MD5) and sends it to the backend. The network disk server determines whether the file exists. If it exists, it directly completes the "rollover" of the file at the back end, directly telling the front end that the upload is successful.
However, this is my own guess, in the end whether it is right, has not been verified.
I told him my guess, and he asked: what if there is a hash conflict?
I thought for a moment and then said: then add a few more hashes!
But how does Baidu online disk do in the end? Now that the reader has asked, I took the opportunity to spend a few minutes to study it, which can be regarded as answering this doubt and increasing my knowledge.
MD5 conflict
First of all, with only one hash, it has been proved that conflicts will occur, not just in theory.
For example, I found an example on Zhihu. The following two different pieces of data differ by only two bytes:
Calculate the md5 separately, and the result is the same:
Therefore, if you use only one hash to determine that it is the same file, it is more likely to get the wrong object.
Some people even propose a hash collision attack based on this: if I know the MD5 value of a file, but can't get it, and I construct a file with the same md5 through mathematical calculation, won't I just transfer that file to me? What if it's a private document? Then nothing happened!
The practice of Baidu online disk
That Baidu network disk is how to do?
First, upload a slightly larger file (the small file has already run out of time to calculate the hash), use the browser F12 method, and take a look at its network request:
As you can see, Baidu network disk transfers files in blocks, which is also a popular practice in the industry at present. If the network is not well disconnected, only the remaining blocks need to be transferred next time.
However, notice that in the middle of the block above, a request called rapidupload interface is inserted. You can also guess from the name. This interface must have something to do with its "second pass" function.
Let's take a look at the parameters of the request, which is a Form form with several fields:
Content-length: file length
Content-md5: MD5 of the file
Slice-md5: MD5 of file slices
See here you guess, must be a joint judgment of these three parameters, while meeting the conditions can be regarded as the same file!
Let's take a look at what the server responded to:
The second pass is successful!
What will be returned if you upload a file that definitely doesn't exist at the back end? I constructed one for testing:
See, 404! It means there is no such file at the back end, so just pass it on honestly.
Next, I would like to take a look at this slice of md5, Baidu network disk is how to cut.
Through the Initiator function in network communication, you can locate where the JS code is making the request:
Through the call stack, you see the function called rapidUpload, and then follow up the next step to find the place where the slice MD5 is calculated:
In fact, it calculates the first 262144 bytes of the file, that is, 256KB. If the file is smaller than this, there is no need to send it in seconds.
But strangely, I withheld the first 256 bytes of the file, and the calculated md5 is not consistent with the parameters uploaded in its interface!
It made me wonder for several minutes. Isn't it that simple?
I hit the breakpoint again in the position of the calculation, and found that its calculation is the same as mine, but it changed after it was sent out through the network. It's really Schrodinger's MD5. Strange!
However, the program is not quantum mechanics, it will not deceive people, and I soon found the problem: Baidu's online disk may be worried that its path number will be discovered, and encrypt the MD5 and slice MD5 of the file!
This is the encryption function:
Some simple string handling.
All right, now you can answer the previous reader's question:
When uploading a file from Baidu network disk, if the file exceeds 256KB, the MD5 of the entire file and the MD5 of the 256KB content before the file will be calculated, and the two MD5 values will be encrypted and requested to be uploaded in seconds at the backend. The backend determines whether the file exists by two MD5 and length information, and completes the second transmission if it exists.
In doing so, although there is no theoretical guarantee that hash collisions will not occur, the probability is at least greatly reduced in this way.
The above is the editor for you to share how to achieve Baidu network disk second pass, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.