In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)05/31 Report--
Today Xiaobian to share with you how to use Python to read the content of Word files related knowledge points, detailed content, clear logic, I believe most people still know too much about this knowledge, so share this article for everyone to refer to, I hope you read this article after some harvest, let's learn about it together.
Use python to read files in batches. Python-docx is the weapon of word.
python-docx is a python library used to create modifiable Microsoft Word, providing a full set of Word operations and is the most commonly used Word tool.
Before using, first understand a few concepts:
Document: is a Word document object, different from the concept of Worksheet in VBA, Document is independent, open different Word documents, there will be different Document objects, there is no influence on each other
Paragraph: is a paragraph, a Word document consists of multiple paragraphs, when you enter a key in the document, it will become a new paragraph, enter shift + enter, no segmentation
Run represents a paragraph, each paragraph is composed of multiple paragraphs, a paragraph with the same style of continuous text, composed of a paragraph, so a paragraph object has a Run list.
For example, the word document schematic diagram shown below:
Word document structure is divided as follows:
python-docx installation
Installation:
pip install python-docx If the installation speed is too slow, you can change a domestic source address (as follows)
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple python-docx
Import:
import docx
from docx import …
python-docx Document
Import package:
from docx import Document
How to use:
Document(word)
Return value:
word file object
Python-docx paragraph content read
In fact, to read a word document, it is mainly to read its paragraphs and its tables. Whether it is a paragraph or a table, its interior is a string, and our goal is to read the contents of these strings.
Let's look at how the paragraph content is read:
Source:
document_obj.paragraphs Returns a list of paragraphs through the paragraphs function of the document object; if there are multiple paragraphs in the word file, there will be multiple paragraph objects.
How to use:
Get each paragraph object by looping and calling text
The demo script is as follows:
# coding: utf-8 import osfrom docx import Documentpath = os. path. join(os.getcwd(), 'test_file/text.docx') print("\' text.docx\'path is: ", path) #debug path doc = Document(path)for p in doc.paragraphs: print(p.text)
The results of the run are as follows: (PS: Text is just a demo, I am not a training institution!)
python-docx table content reading
Next let's look at how to read the table contents in a word file:
Source:
document_obj.tables Returns a list of tables through the paragrams function of the document object; there are table objects one by one.
How to use:
Again, loop through the contents of rows and columns
Return value:
Each table field (string)
The demo code is as follows:
# coding: utf-8 import osfrom docx import Documentpath = os. path. join(os.getcwd(), 'test_file/text.docx') print("\' text.docx\'path is: ", path) #debug path doc = Document(path)# for p in doc. paragrams:# print(p.text)for t in doc.tables: # for loop get table object for row in t.rows: #Get every line row_str = [] for cell in row.cells: #Get a separate table for each row, then stitch its contents together; print it out in the second for loop after stitching row_str.append(cell.text) print(row_str) #You can also get the contents of the columns in the table through "columns", you can try it yourself
The results are as follows:
The above is "how to use Python to read the contents of Word files" all the contents of this article, thank you for reading! I believe everyone has a great harvest after reading this article. Xiaobian will update different knowledge for everyone every day. If you want to learn more knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.