Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to implement MD5 for File de-duplication by python

2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "python how to achieve MD5 for file de-duplication", in the daily operation, I believe that many people have doubts on how to achieve python MD5 for file de-duplication. The editor consulted all kinds of information and sorted out a simple and easy-to-use method of operation. I hope it will be helpful for you to answer the doubt of "how to achieve MD5 for file de-duplication". Next, please follow the editor to study!

working principle

The script will check all the files under the file path you give, then calculate the MD5 value for each file and add it to a list.

If the MD5 value of a file is not in the list, it is determined that it is the file we need, and the script will create a new folder called "de-duplicate results" on the desktop and copy it to it.

If the MD5 value of a file is in the list, it is determined that it is not the file we need and nothing is done about it.

The code can be run directly without any modification (except for installing library files that may be missing)

Code import osimport shutilimport hashlib# deduplicates the file # calculates the MD5 value of each file Def only_one (test_path): md5_list = [] count = 0 for current_folder, list_folders, files in os.walk (test_path): for file in files: file_path = current_folder +'\'+ file # to get the path f = open (file_path) of each file 'rb') # start calculating the MD5 value of each file md5obj = hashlib.md5 () md5obj.update (f.read ()) get_hash = md5obj.hexdigest () f.close () md5_value = str (get_hash). Upper () # start deduplicating if md5_value in md5_list : # if the MD5 value of this file has ever appeared Do nothing on it count + = 1 print ('\ 033 [31m [-] find duplicate file:\ 033 [0m'+ str (file)) else: md5_list.append (md5_value) # if the md5 value of this file does not exist in the list Shutil.copy (file_path, path2) print ('\ 033 [31m [-]) was added to the list and duplicate files were found: {}\ 033 [0m'.format (count)) if _ _ name__ = ='_ main__': print ('\ 033 [4] 33m [+] this script checks all files under the specified path and deduplicates\ 033 [0m') print ('033 [4) by calculating the MD5 value of the file. 33m [+] the deduplicated files will be copied to the new desktop folder The source file will not be lost\ 033 [0m') path = input ('\ 033 [34m [+]) Please enter the folder address:\ 033 [0m') os.chdir (path) # path2 is used to store all the deduplicated results desktop_path = os.path.join ("~"), 'Desktop') # get the desktop path path2 = os.path.join (desktop_path) Os.makedirs (path2) only_one (path) print ('033 [32m [-] total number of existing non-duplicate files: {}\ 033 [0m'.format (len (os.listdir (path2)

At this point, on the "python how to achieve MD5 file de-duplication" on the end of the study, I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report