Python solves the processing of doc and docx suffix files through docx module. 07/02 Update SLTechnology News&Howtos

Python solves the processing of doc and docx suffix files through docx module.

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Import os,shutil,docx,re,timefrom win32com import client as wc# reads files from all cascading directories into the specified directory def count_files (file_dir): count=0 for pmaideddirection f in os.walk (file_dir): for c in f: if c.split ('.') [- 1] = "doc": count + = 1 src_dir = os.path.join (p C) print (src_dir) dst_dir = file_dir + "back" if not os.path.exists (dst_dir): os.makedirs (dst_dir) shutil.copy (src_dir, dst_dir) return count# extract the email address in each docx resume document We use the python-docx module here to solve pip install python-docxdef count_mail (file_dir,dst_file): mail_list = [] for parent,dirctiory,files in os.walk (file_dir): for f in files: doc = docx.Document (os.path.join (parent) F)) pattern = re.compile (ringing stories'([a-zA-Z0-9.percent percent) -] + @ [a-zA-Z0-9\ t\ s -] + (\ .[ a-zA-Z0-9\ t\ s] {2jue 4}))'' Re.VERBOSE) for para in doc.paragraphs: for groups in pattern.findall (para.text): mail_list.append (groups [0]. Replace (",") + " ") with open (dst_file,'w') as f: f.writelines (mail_list) print (" = email message written successfully = ") # since the python-docx module can only handle docx suffixes, we need to process files with doc suffixes The doc suffix must be converted to docxdef docxTodoc (old_doc,new_doc) through the win32com module: word = wc.Dispatch ('Word.Application') for parent,directory,files in os.walk (old_doc): for f in files: doc = word.Documents.Open (os.path.join (parent,f)) # File new_filepath=os.path.join (new_doc) under the target path F.split (".") [0] + ".docx") print (new_filepath) doc.SaveAs (new_filepath, 12, False, ", True,", False, False, False False) # File doc.Close () print (time.time ()) word.Quit () if _ _ name__ ='_ _ main__': print (count_files (r "C:\ Users\ icestick\ Desktop\ 51job_ exported resume _ 20180917") count_mail (r "C:\ Users\ icestick\ Desktop\ new_doc") R "C:\ Users\ icestick\ Desktop\ test.txt") old_doc = r "C:\ Users\ icestick\ Desktop\ 51jb _ export resume _ 20180917" # need to convert the doc directory to the original directory in docx format new_doc = r "C:\ Users\ icestick\ Desktop\ new_doc" # need to convert the doc directory to the target directory mail_extract = r "C:\ Users\ Icestick\ Desktop\ test.txt "# mailbox extracted file if not os.path.exists (new_doc): os.mkdir (new_doc) print (" = directory created successfully = ") docxTodoc (old_doc) New_doc) print ("= docx format conversion =") count_mail (new_doc, mail_extract) else: docxTodoc (old_doc, new_doc) print ("= docx format conversion =") count_mail (new_doc, mail_extract)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.