How to solve the problem of batch conversion of world files by Python 07/19 Update SLTechnology News&Howtos

How to solve the problem of batch conversion of world files by Python

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly shows you the "Python how to solve the world file batch conversion problem", the content is easy to understand, clear, hope to help you solve the doubt, the following let the editor lead you to study and learn "Python how to solve the world file batch conversion problem" this article.

Word is one of the most frequently used office software. if you need to adjust the format of 100 Word documents to keep uniform, or if you want to convert all 100 Word to pdf, then you need Python to help.

Brief introduction of python-docx Library

Python-docx is a third-party library that can read and write Word, can read Word content, can add paragraphs, tables, pictures, headings to Word documents, and apply paragraph styles, bold and italic, and character styles.

The installation can be completed by executing the following installation command:

Pip install python-docx

Official document: https://python-docx.readthedocs.io/

Read Word

Here I first create an example with a title, text, and table:

The code to read the Word content is as follows:

From docx import Documentdef view_docs (docx_file): # Open document 1 doc = Document (docx_file) # read each segment pl = [paragraph.text for paragraph in doc.paragraphs] # output read content for i in pl: print (I) def view_docs_table (docx_file): # Open document 1 doc = Document (docx_file) # read each Segment content tables = [table for table in doc.tables] for table in tables: for row in table.rows: for cell in row.cells: print (cell.text End='') print () print ('\ n') if _ _ name__ = ='_ _ main__': view_docs ("Python Automation Office practice course .docx") view_docs_table ("Python Automation Office practice Class .docx")

The running results are as follows:

Write to Word

Now, use Python to create the same Word document as you just did:

From docx import Documentfrom docx.shared import Pt, RGBColorfrom docx.oxml.ns import qnfrom docx.enum.text import WD_PARAGRAPH_ALIGNMENTfrom docx.table import _ Cellfrom docx.oxml import OxmlElement def set_cell_border (cell: _ Cell, * * kwargs): "Set Cell`s border Usage: set_cell_border (cell, top= {" sz ": 12," val ":" single "," color ":" # FF0000 "," space ":" 0 "} Bottom= {"sz": 12, "color": "# 00FF00", "val": "single"}, start= {"sz": 24, "val": "dashed", "shadow": "true"}, end= {"sz": 12, "val": "dashed"}, "" tc = cell._tc tcPr = tc.get_or_add_tcPr () # check for tag existnace If none found, then create one tcBorders = tcPr.first_child_found_in ("w:tcBorders") if tcBorders is None: tcBorders = OxmlElement ('WRV tcBorders') tcPr.append (tcBorders) # list over all available tags for edge in ('start',' top', 'end',' bottom', 'insideH' 'insideV'): edge_data = kwargs.get (edge) if edge_data: tag =' w: {} '.format (edge) # check for tag existnace, if none found Then create one element = tcBorders.find (qn (tag) if element is None: element = OxmlElement (tag) tcBorders.append (element) # looks like order of attributes is important for key in ["sz", "val", "color", "space" "shadow"]: if key in edge_data: element.set (qn ('w: {} '.format (key)), str (edge_ data [key]) document = Document () document.styles [' Normal']. Font.name = u' document.styles ['Normal']. _ element.rPr.rFonts.set (qn (' WrieastAsia'), u' Verdana') # # title def add_header (text, level Align='center'): title_ = document.add_heading (level=level) if align= = 'center': title_.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER # title center elif align= =' right': title_.alignment = WD_PARAGRAPH_ALIGNMENT.RIGHT # title center title_run = title_.add_run (text) # add title content # title_run.font.size = Pt ( 24) # set title font size title_run.font.name = 'Times New Roman' # set title Spanish font title_run.font.color.rgb = RGBColor (0 0,0) # Font color title_run.element.rPr.rFonts.set (qn ('wappleastAsia'),' Microsoft Yahei') # set the title Chinese font add_header (text='Python Automation Office practice', level=1) add_header (text='Python Foundation', level=2, align='left') document.add_paragraph ('Python is an object-oriented advanced programming language Easy to learn and use, it is the preferred tool for office automation.') Add_header ('Python playing with pictures', level=2, align='left') document.add_paragraph ('pictures come into contact with more media files at work, you may need to compress and watermark pictures Word recognition and other operations') records = (('Python basis', '00Python 30mm,' 2021-08-01mm,''), ('Python play with pictures', '01RV', '2021-08-01'), ('Python play with Word',' 01Python', '2021-08-01'),') table = document.add_table Cols=4) hdr_cells = table.rows [0] .cellshdr _ cells [0] .text = 'chapter' hdr_cells [1] .text = 'duration' hdr_cells [2] .text = 'date' hdr_cells [3] .text = 'remarks' for cell in hdr_cells: set_cell_border (cell, top= {"sz": 12, "val": "single", "color": "# FF0000", "space": "0"} Bottom= {"sz": 12, "val": "single", "color": "# FF0000", "space": "0"}, start= {"sz": 12, "val": "single", "color": "# FF0000", "space": "0"}, end= {"sz": 12, "val": "single" "color": "# FF0000", "space": "0"},) for chapter, time, date, note in records: row_cells = table.add_row (). Cells row_cells [0] .text = chapter row_cells [1] .text = time row_cells [2] .text = date row_cells [3] .text = note for cell in row_cells: set_cell_border (cell) Top= {"sz": 12, "val": "single", "color": "# FF0000", "space": "0"}, bottom= {"sz": 12, "val": "single", "color": "# FF0000", "space": "0"}, start= {"sz": 12 "val": "single", "color": "# FF0000", "space": "0"}, end= {"sz": 12, "val": "single", "color": "# FF0000", "space": "0"}, document.save ('Python Automation Office practice .docx')

Among them, the code that adds a border to the table is called as a function because it is complex.

The resulting Word document is as follows, where the color of the table border, the color of the title, the font size, and the style can be set:

Other actions

Add a page break:

Document.add_page_break ()

Add a picture:

Document.add_picture ('monty-truth.png', width=Inches 1.25)

Set the column width and row height of the table

Setting column width can set the width of each cell, and the width of each cell in the same column is the same. If you define a different width, it will be accurate to the maximum value of''table.cell (0meme0). Width = Cm (10) # set row height table.rows [0] .height = Cm (2)

Table font setting:

From docx.enum.text import WD_PARAGRAPH_ALIGNMENT# sets the font properties of the entire table table.style.font.size=Pt (18) table.style.font.color.rgb=RGBColor (255,0,0) table.style.paragraph_format.alignment=WD_PARAGRAPH_ALIGNMENT.CENTER

Merge cells

Cell_1=table.cell (1,0) cell_2=table.cell (2,1) cell_1.merge (cell_2)

Modify the document font:

From docx import Documentfrom docx.shared import Pt # sets pixels, Indent, etc. From docx.shared import RGBColor # sets font color from docx.oxml.ns import qndoc = Document ("xxx.docx") for paragraph in doc.paragraphs: for run in paragraph.runs: run.font.bold = True run.font.italic = True run.font.underline = True run.font.strike = True run.font.shadow = True run.font.size = Pt (18) run.font.color.rgb = RGBColor Run.font.name = "boldface" # set Chinese fonts like boldface You must add the following two lines of code r = run._element.rPr.rFonts r.set (qn ("w:eastAsia"), "boldface") doc.save ("xxx.docx")

Line spacing adjustment:

Paragraph.paragraph_format.line_spacing = 5.0

Adjust the spacing before and after the segment:

# before paragraph paragraph.paragraph_format.space_before = Pt (12) # after paragraph paragraph.paragraph_format.space_after = Pt (10) Word to pdf

It only takes two lines of code to transfer Word to pdf. Here, the tripartite library docx2pdf is pip install docx2pdf before use.

The specific code is as follows:

From docx2pdf import convertconvert ("Python Automation Office practice .docx", "Python Automation Office practice .docx.pdf")

If you want to batch convert Word to pdf in a directory, you can do this:

From docx2pdf import convertconvert ("directory path /")

Is it very convenient to convert to pdf in batches?

Knowing these small operations, you can assemble big operations, for example, you can use Python to convert Word to pdf and send emails to others as attachments.

The above is all the contents of this article "how to solve the problem of batch conversion of world files by Python". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.