Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python to read the contents of Word file

2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)05/31 Report--

Today Xiaobian to share with you how to use Python to read the content of Word files related knowledge points, detailed content, clear logic, I believe most people still know too much about this knowledge, so share this article for everyone to refer to, I hope you read this article after some harvest, let's learn about it together.

Use python to read files in batches. Python-docx is the weapon of word.

python-docx is a python library used to create modifiable Microsoft Word, providing a full set of Word operations and is the most commonly used Word tool.

Before using, first understand a few concepts:

Document: is a Word document object, different from the concept of Worksheet in VBA, Document is independent, open different Word documents, there will be different Document objects, there is no influence on each other

Paragraph: is a paragraph, a Word document consists of multiple paragraphs, when you enter a key in the document, it will become a new paragraph, enter shift + enter, no segmentation

Run represents a paragraph, each paragraph is composed of multiple paragraphs, a paragraph with the same style of continuous text, composed of a paragraph, so a paragraph object has a Run list.

For example, the word document schematic diagram shown below:

Word document structure is divided as follows:

python-docx installation

Installation:

pip install python-docx If the installation speed is too slow, you can change a domestic source address (as follows)

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple python-docx

Import:

import docx

from docx import …

python-docx Document

Import package:

from docx import Document

How to use:

Document(word)

Return value:

word file object

Python-docx paragraph content read

In fact, to read a word document, it is mainly to read its paragraphs and its tables. Whether it is a paragraph or a table, its interior is a string, and our goal is to read the contents of these strings.

Let's look at how the paragraph content is read:

Source:

document_obj.paragraphs Returns a list of paragraphs through the paragraphs function of the document object; if there are multiple paragraphs in the word file, there will be multiple paragraph objects.

How to use:

Get each paragraph object by looping and calling text

The demo script is as follows:

# coding: utf-8 import osfrom docx import Documentpath = os. path. join(os.getcwd(), 'test_file/text.docx') print("\' text.docx\'path is: ", path) #debug path doc = Document(path)for p in doc.paragraphs: print(p.text)

The results of the run are as follows: (PS: Text is just a demo, I am not a training institution!)

python-docx table content reading

Next let's look at how to read the table contents in a word file:

Source:

document_obj.tables Returns a list of tables through the paragrams function of the document object; there are table objects one by one.

How to use:

Again, loop through the contents of rows and columns

Return value:

Each table field (string)

The demo code is as follows:

# coding: utf-8 import osfrom docx import Documentpath = os. path. join(os.getcwd(), 'test_file/text.docx') print("\' text.docx\'path is: ", path) #debug path doc = Document(path)# for p in doc. paragrams:# print(p.text)for t in doc.tables: # for loop get table object for row in t.rows: #Get every line row_str = [] for cell in row.cells: #Get a separate table for each row, then stitch its contents together; print it out in the second for loop after stitching row_str.append(cell.text) print(row_str) #You can also get the contents of the columns in the table through "columns", you can try it yourself

The results are as follows:

The above is "how to use Python to read the contents of Word files" all the contents of this article, thank you for reading! I believe everyone has a great harvest after reading this article. Xiaobian will update different knowledge for everyone every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report