In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the relevant knowledge of "what is the script for Python quick de-repetition". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Download the pyhon script to the local
II. Usage
1. Python2 environment is required.
2. Put the deduplicated file together with the python script
3. Create several new files with duplicate content and put them in a location that is not with the python script / root/123
4. Modify the python script
# coding=utf-8import sys, re, osdef file_merge (): input_path = "/ root/123/" # fill in your path here, note the last "/" # use the os.listdir function to get all the file names under the path, and there is a list # use the os.path.join function Spell the filename and path into an absolute path whole_file = [os.path.join (input_path,file) for file in os.listdir (input_path)] content = [] # for each path, after opening it, use readlines to get all the contents for w in whole_file: with open (wmaijingrb') as f: content = content+f.readlines () # construct the output path It is in the same folder as the input path If there is no file in this folder, output_path = os.path.join (input_path,' merges all files .txt') # writes the contents to the file with open (output_path) 'wb') as f: f.writelines (content) def getDictList (dict): regx =''[\ w\ ~ `\!\ @\ #\ $\%\ ^\ &\ *\ (\)\ _\ -\ + =\ [\]\ {\}\:\ +''with open (dict) as f: data = f.read () return re.findall (regx, data) def rmdp (dictList): return list (set (dictList)) def fileSave (dictRmdp, out): with open (out) 'a') as f: for line in dictRmdp: f.write (line +'\ n') def main (): try: dict ='/ root/123/ merge all files. Txt 'out =' / root/123/ deduplicates all files .txt 'except Exception, e: print' error:' E me = os.path.basename (_ file__) exit () dictList = getDictList (dict) dictRmdp = rmdp (dictList) fileSave (dictRmdp, out) if _ name__ = ='_ main__': file_merge () main ()
5. Python2 quchong.py runs the script and generates deduplicated files in the / root/123 directory
6. Working principle
Merge all files in the current directory into one file: merge all files .txt, de-duplicate the file, resulting in de-duplicating all files .txt, which is the last file you want.
This is the end of the content of "what is the script for quick de-repetition of Python". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.