In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "how to use Python to make a file to heavy gadget", the content of the article is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "how to use Python to make a file to heavy gadget" bar!
Preface
Often when downloading network material, there are a lot of duplicate files in a mess, so I want to achieve a de-duplication operation.
The main implementation idea is to traverse a folder including all the files under its subfolders, and finally, filter out all the files through the comparison of the MD5 function, and finally remove the duplicate files.
Implementation steps
The third-party libraries used are more common, of which only hashlib is used to compare files is not very common. The rest are more common third-party libraries for auxiliary operations.
Import os # Application File Operation import hashlib # File comparison Operation import logging # Log function Operation import sys # system Application Operation
The setting of the log is basically this set of paradigm, according to the rules to set up the log printing related information.
Logger = logging.getLogger ('system file deduplication') logging.basicConfig (format='% (asctime) s% (levelname)-8s:% (message) s') logger.setLevel (logging.DEBUG)
The implementation logic code block for file deduplication is as follows:
Diretory = input ('Please enter the file directory to be sorted out:\ n') # the folder path if os.path.isdir (diretory): logger.info (' current directory ['+ diretory +'] checked successfully!) Md5s = [] for file_path, dir_names, file_names in os.walk (rushing'+ diretory): for file_name in file_names: try: file_name_path = os.path.join (file_path) File_name) logger.info ('current alignment path:' + file_name_path) md5 = hashlib.md5 () file = open (file_name_path "rb") md5.update (file.read ()) file.close () md5_value = md5.hexdigest () if md5_value in md5s: os.remove (file_name_path) logger.info ('['+ file_name_path +'] repetition has been removed!) Else: md5s.append (md5_value) except: logger.error ('['+ file_name_path +'] exception occurs in comparison, execute the next one!) Else: logger.error ('the folder or directory entered does not exist!')
The above is the whole implementation process of file deduplication, and it is more practical to make a gadget to clean up computer files.
Supplement
This article is mainly realized by using the hashlib.md5 () function, and the following editor will focus on hashlib.md5 ().
Python's hashlib provides common summary algorithms, such as MD5,SHA1, and so on.
What is the summary algorithm?
The algorithm is also called hash algorithm and hash algorithm. It converts any length of data into a fixed-length data string (usually represented by a hexadecimal string) through a function. For example, you write an article with the string 'how to use python hashlib-by Michael',' with a summary of the article '2d73d4f15c0db7f5ecb321b6a65e5d6d'. If someone tampers with your article and publishes it as' how to use python hashlib-by Bob', 'you can point out at once that Bob tampered with your article because the abstract calculated according to' how to use python hashlib-by Bob' is different from the abstract of the original article.
It can be seen that the summary algorithm is through the summary function f () to calculate the fixed-length summary data for any length of data digest, in order to find out whether the original data has been tampered with.
The reason why the algorithm can indicate whether the data has been tampered with is that the summary function is an one-way function, so it is easy to calculate f (data), but it is very difficult to deduce data through digest. Moreover, a bit change to the original data will result in a completely different summary.
Of course, hashlib.md5 () can be used not only to deduplicate files, but also to encrypt passwords. Here is the sample code
#! / usr/bin/env python#-*- coding:utf-8-*-"" function: login authentication module details: 1. Password file for passwd 2.passwd is not created or missing, will prompt: password file does not exist, it is recommended to re-register! 3. Unregistered users login will prompt: the user name does not exist, please register first! 4. When a registered user logs in, he forgets his password. If the password is not correct after 3 attempts, he or she will quit authentication. After a while, he or she can log in again. Login authentication as a decorator "" import jsonimport hashlibimport ospwd = os.getcwd () fileName = os.path.join (pwd, "passwd") # encrypts the plaintext password through md5 and returns the value of an encrypted md5 def calc_md5 (passwd): md5 = hashlib.md5 ("haliluya") md5.update (passwd) ret = md5.hexdigest () return ret# new user registration module def register (): # determine whether the password file passwd exists Load the list if it exists, re-create an empty dictionary if os.path.exists (fileName): # load the user list, and the data structure is a dictionary. UserName username _ raw_input ("name:") with open ("passwd", "r +") as loadsFn: userDB = json.loads (loadsFn.read ()) else: userDB = {} # Let the user enter the user name userName = raw_input ("name:") # Flag bit: control loop pops out of the flag = True while flag: # when the user registers You need to enter the password passwd1 = raw_input ("password:") passwd2 = raw_input ("confirm password:") # if the password is inconsistent twice, do not perform the next step, enter the password again and confirm that if not passwd1 = = passwd2: continue else: # the password is the same twice, and the location of the flag is False Next time, flag = False # call the calc_md5 function to change the plaintext password to the corresponding MD5 value. Used to save passwdMd5 = calc_md5 (passwd1) # store the username and password in the dictionary userDB userDB [userName] = passwdMd5 # save the username and password in the file with open (fileName, "w") as dumpFn: dumpFn.write (json.dumps (userDB)) # user login authentication, decorator def login (func): def decorater (* args,**kwargs): # determine whether the passwd file exists Load userDB (user: password) if it exists, otherwise re-register the new passwd file and return if os.path.exists (fileName): with open ("passwd", "r +") as loadsFn: userDB = json.loads (loadsFn.read ()) else: print "password file does not exist, re-registration is recommended!" Register () return name = raw_input ("username:") # whether the user name exists, continue to enter the password if it exists, and register if it does not exist. If name in userDB.keys (): flag= True counter = 0 # enter the password in a loop, the password is correct, flag=False (next time directly jump out of the loop) and execute the function If the password is incorrect, 3 attempts are allowed and more than 3 failed verification Exit verification while flag: passwd = raw_input ("password:") passwdMd5 = calc_md5 (passwd) if passwdMd5 = = userDB [name]: flag = False func (* args,**kwargs) elif counter > 2: print "you have tried three times Please try again later! " Return else: counter + = 1 else: print "user name does not exist, please register first!" Register () return decoraterif _ _ name__ = "_ _ main__": @ login def hello (): print "Hello world!" Hello () Thank you for your reading, the above is "how to use Python to make a file to heavy gadget" content, after the study of this article, I believe you on how to use Python to make a file to heavy gadget this problem has a deeper understanding, the specific use of the situation also needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.