Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the method of reading tables in PDF by Java?

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

The main content of this article is to explain "what is the method of reading tables in PDF by Java". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "what is the method of reading tables in PDF by Java?"

Catalogue

I. Overview

II. Environmental configuration

1. Manual import

2. Download and import Maven warehouse

3. Read tables in PDF

I. Overview

This article uses a Java example to show how to read a table in PDF. Here, import the jar package in Spire.PDF for Javah, and use the correlation and methods provided by it to get the text content in the table. The main classes, methods and explanations used in this code are sorted out in the following table for reference:

Type description PdfDocument ClassRepresents a pdf document model.PdfDocument. LoadFromFile (string filename) MethodLoads a PDF document.PdfTableExtractor ClassRepresents the PDF table extractor.PdfTable ClassDefines a PDF table.PdfTableExtractor. ExtractTable (int pageIndex) MethodExtracts table from page.PdfTable.getText (int rowIndex,int columnIndex) MethodGets Text in cell.FileWriter. Write () MethodSaves extracted text in table to a .txt file. II. Environmental configuration

IntelliJ IDEA 2018 (JDK 1.8.0)

PDF test documentation

PDF Jar package: Spire.PDF for Java Version: 4.10.2

There are two ways to import Jar packages:

1. Import manually download the jar package locally and decompress it. Then perform the following steps to import manually:

2. Download and import Maven warehouse

If you use maven, you need to configure the maven path in pom.xml and specify the dependency, as follows:

Com.e-iceblue https://repo.e-iceblue.cn/repository/maven-public/ e-iceblue spire.pdf 4.10.2 III. Read the table import com.spire.pdf.*;import com.spire.pdf.utilities.PdfTable;import com.spire.pdf.utilities.PdfTableExtractor;import java.io.FileWriter;import java.io.IOException in PDF Public class ExtractTable {public static void main (String [] args) throws IOException {/ / load PDF document PdfDocument pdf = new PdfDocument (); pdf.loadFromFile ("test.pdf"); / / create an instance of StringBuilder class StringBuilder builder = new StringBuilder (); / / extract form PdfTableExtractor extractor = new PdfTableExtractor (pdf); PdfTable [] tableLists; for (int page = 0; page

< pdf.getPages().getCount(); page++) { tableLists = extractor.extractTable(page); if (tableLists != null && tableLists.length >

0) {for (PdfTable table: tableLists) {int row = table.getRowCount (); int column = table.getColumnCount (); for (int I = 0; I < row; iTunes +) {for (int j = 0; j < column) String text +) {String text = table.getText (I, j); builder.append (text+ "");} builder.append ("\ r\ n") } / / write the extracted table contents to the txt document FileWriter fileWriter = new FileWriter ("ExtractedTable.txt"); fileWriter.write (builder.toString ()); fileWriter.flush (); fileWriter.close ();}}

The reading result of the table content:

Note:

1. Note that the version of PDF Jar package used is 4.10.2. Jar packages earlier than this version do not support reading tables.

two。 The file paths in the code are F:\ IDEAProject\ Table_PDF\ test.pdf and F:\ IDEAProject\ Table_PDF\ ExtractedTable.txt, and the file paths can be customized to other paths.

At this point, I believe you have a deeper understanding of "what is the way Java reads tables in PDF". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report