In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you "how to use python to merge word documents", the content is easy to understand, clear, hope to help you solve your doubts, the following let Xiaobian lead you to study and learn "how to use python to merge word documents" this article.
Design ideas:
First, the entire script implements two functions:
Check the list of unsubmitted word documents in each directory and merge the word documents in each directory to view the unsubmitted list:
For this requirement, the first step is to read an Excel file with the owner's name and other information. There is a format requirement. Then the names of all the people are obtained by traversing the Excel information. Traverse each directory to see if there is a file with the corresponding name, and if not, output the name of the file that has not been submitted.
Merge word files:
There are similarities between merging word files and the previous requirement. First of all, we all need to read the Excel file, get the name information, and then get the file path of all the word files submitted by this person in each directory, and then merge the files through the operation of merging word, and finally output to the specified directory.
Description of the script environment:
The script depends on the third-party package, and the corresponding third-party package must be installed before execution
Pip install python-docx pywin32 xlrd
First of all, the directory structure must be as shown in the following figure, and all directory names that need to be traversed must be [training + number], because multiple regular matches are involved in the script.
Second, the Excel file must follow the format shown in the following figure. The first line is the header line, which is skipped automatically when traversing, and columns C and D are traversed when traversing, where column C is the person number and column D is the person's name.
Next, the python script must be in the root directory
Finally, when you execute the script, you must pass parameters, and the parameter passed is the Excel table
Microsoft Windows [version 10.0.19043.1415] (c) Microsoft Corporation. All rights reserved. C:\ Windows\ system32 > python tools.py roster. Xlsx complete code: #! / usr/bin env python#-*-coding:utf-8-*-"" = = Power By Python3= Author Task138 = = "" import sysimport xlrd, os Refrom docx import Documentfrom docxcompose.composer import Composerfrom win32com import client as wc # read the Excel form to get the student's student number and name def read_excel (excel_file): workbook = xlrd.open_workbook (excel_file) sheet = workbook.sheet_by_index (0) name_list = [] name_dict = [] Sno_list = sheet.col_values (2) [1RV:] Sname_list = sheet.col_values (3) [1: :] for i in range (len (Sno_list)): try: Sno = str (int (Sno_ list [I]) except: Sno = Sno_ list [I] dict = {} dict ['Sno'] = Sno dict [' Sname'] = Sname_ list [I] name_list.append (Sname_ list [I]) Name_dict.append (dict) return name_list Name_dict # merge document def merge_doc (source_file_path_list Target_file_path): # fill full page symbol document page_break_doc = Document () page_break_doc.add_page_break () # define a new document target_doc = Document (source_file_path_list [0]) target_composer = Composer (target_doc) for i in range (len (source_file_path_list)): # skip the first file as a template If iComple0: continue # full page character document target_composer.append (page_break_doc) # spliced document content f = source_file_path_ list [I] target_composer.append (Document (f)) # Save target document target_composer.save (target_file_path) print ('[% s] saved successfully '% target_file_path) if _ _ name__ = =' _ main__': if len (sys.argv)
< 2: print('缺乏必要的参数,请输入学生Excel表作为参数') print('程序终止') exit() excel_file = sys.argv[1] print('请选择需要执行的功能:') print('[ 0 ] 查看各实训目录下未提交的学生名单') print('[ 1 ] 合并实训文件') cmd = input('请选择: ') while cmd not in ['0','1']: print('输入有误,请重新输入,按 Ctrl+C 可退出程序') print('请选择需要执行的功能:') print('[ 0 ] 查看各实训目录下未提交的学生名单') print('[ 1 ] 合并实训文件') cmd = input('请选择: ') try: name_list, name_dict = read_excel(excel_file) except Exception as e: print('Excel读取失败,程序终止,错误如下:') print(e) print() exit() else: if cmd == '0': # 实训目录的数列 file_list = [] for i in os.listdir(): if os.path.isdir(i): if re.match(r'实训\d', i): file_list.append(i) for i in range(1, len(file_list) + 1): dir_name = '实训%s' % i # 进入该实训目录 os.chdir(dir_name) file_list = os.listdir() submit_list = [] for x in file_list: for j in name_list: if j in x and j not in submit_list: submit_list.append(j) result = list(set(submit_list) ^ set(name_list)) if result: print(dir_name, result) os.chdir('../') if cmd == '1': if not os.path.exists('实训汇总'): os.mkdir('实训汇总') print('目录[ 实训汇总 ]创建成功') # 实训目录的数列 file_list = [] for i in os.listdir(): if os.path.isdir(i): if re.match(r'实训\d',i): file_list.append(i) for i in name_dict: doc_list = [] for j in range(1,len(file_list)+1): dir_name = '实训%s' % j # 进入该实训目录 os.chdir(dir_name) tmp = [] for x in os.listdir(): # 判断文件尾缀 fname,fext = os.path.splitext(x) # 如果是.doc,则转换为.docx if fext == '.doc' and not x.startswith('~$'): w = wc.Dispatch('Word.Application') doc = w.Documents.Open(os.path.abspath(x)) doc.SaveAs(os.path.join(os.getcwd(),'%s.docx' % fname), 16) doc.Close() os.remove(x) print('转换文件[ %s ]类型为.docx' % x) elif fext == '.docx': if (i['Sname'] in x) and (len(tmp) == 0): # 只有一个文件 tmp.append(x) elif (i['Sname'] in x) and (len(tmp) != 0): # 有多个文件,按照最新的修改时间进行替换 tmp_file = tmp.pop() old_file_mtime = os.path.getmtime(tmp_file) new_file_mtime = os.path.getmtime(x) if new_file_mtime >Old_file_mtime: # the new file is relatively new, and the new tmp.append (x) else: # the old file is relatively new Tmp.append (tmp_file) else: # other file types based on old files, skip # print ('current file [% s] type is not .doc or .docx) Skip the merge of this file'% os.path.abspath (x)) continue if tmp: # if this student's file doc_list.append (os.path.join (dir_name) is available in this training Tmp.pop ()) # returns the parent directory os.chdir ('.. /') if doc_list: # has content Document merging try: merge_file_name = I ['Sno'] +' -'+ I ['Sname'] +' -'+ 'training summary' + '.docx' merge_doc (doc_list) '. / training summary /' + merge_file_name) except Exception as e: print () print ('[% s]) student information is incorrect Program interrupt'% I ['Sname']) print (e) print () function execution effect:
The above is all the contents of the article "how to merge word documents with python". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.