How to use poi to parse data in word documents by Java 07/01 Update SLTechnology News&Howtos

How to use poi to parse data in word documents by Java

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

In this article Xiaobian for you a detailed introduction of "Java how to apply poi to parse the data in word documents", detailed content, clear steps, details handled properly, I hope that this "Java how to use poi to parse the data in word documents" article can help you solve your doubts, the following follow the editor's ideas slowly in-depth, together to learn new knowledge bar.

Apache POI is an open source project of the Apache Software Foundation. POI provides API to Java programs to read and write files in Microsoft Office format. Net developers can use NPOI (POI for .NET) to access the functions of Microsoft Office documents.

The methods are as follows:

1. Add packages in maven

Org.apache.poi poi-scratchpad 3.17 org.apache.poi poi-ooxml 3.17

2. Parse the data in doc

Get the file and convert the data of the MultipartFile object to the local file

File file = new File (FileUtils.getUserDirectoryPath () + "/" + multipartFile.getOriginalFilename ()); FileUtils.copyInputStreamToFile (multipartFile.getInputStream (), file); String fileName = file.getName (). ToLowerCase (); FileInputStream in = new FileInputStream (file); if (fileName.endsWith (".doc")) {/ / process doc format that is office2003 version handlerDoc (in) } if (fileName.endsWith (".docx")) {handlerDocx (in);}

Parsing paragraphs and first tabular data in doc format

/ * doc format parsing * * @ param in * @ throws IOException * / private void handlerDoc (FileInputStream in) throws IOException {POIFSFileSystem pfs = new POIFSFileSystem (in); HWPFDocument hwpf = new HWPFDocument (pfs); / / get the reading range of the document Range range = hwpf.getRange (); for (int I = 0; I < range.numParagraphs () ) {/ / paragraph Paragraph p = range.getParagraph (I); / / paragraph text String paragraphText = p.text (). Replace ("", "); log.info (" paragraphText = {} ", paragraphText); if (paragraphText.contains (VALUE_YLYC)) {analyze = false }} TableIterator it = new TableIterator (range); / / tables in the iterative document / / if there are multiple tables to read only one set is set the number of tables to be read, total is the total number of tables in the file int set = 1, total = 1; int num = set; for (int I = 0; I < set-1) While +) {it.hasNext (); it.next ();} while (it.hasNext ()) {Map tabelText = DocUtils.getTabelDocText ((Table) it.next ()); log.info ("tabelText = {}", tabelText) } / / filter redundant table while (num < total) {it.hasNext (); it.next (); num + = 1;}}

3. Parse the data in docx

Parsing paragraphs and first tabular data in docx format

/ * docx format parsing * * @ param in * @ throws IOException * / private void handlerDocx (FileInputStream in) throws IOException {XWPFDocument xwpf = new XWPFDocument (in); / / get all paragraphs and tables in word List elements = xwpf.getBodyElements () / / No subsequent parsing of for (IBodyElement element: elements) {/ / paragraphs if (element instanceof XWPFParagraph) {String paragraphText = DocUtils.getParagraphText ((XWPFParagraph) element); log.info ("paragraphText = {}", paragraphText) } else if (element instanceof XWPFTable) {/ / form Map tabelText = DocUtils.getTabelText ((XWPFTable) element); log.info ("tabelText = {}", tabelText);} else {log.info ("other content");}

Tool class

Package com.hundsun.fais.innerreport.utils;import org.apache.poi.hwpf.usermodel.Paragraph;import org.apache.poi.hwpf.usermodel.Table;import org.apache.poi.hwpf.usermodel.TableCell;import org.apache.poi.hwpf.usermodel.TableRow;import org.apache.poi.xwpf.usermodel.*;import java.util.* / * * @ author lvbaolin * @ date 10:39 on 2021-4-2 * / public class DocUtils {/ * docx format to obtain table contents * * @ param table * / public static Map getTabelText (XWPFTable table) {Map result = new LinkedHashMap (); List rows = table.getRows (); for (XWPFTableRow row: rows) {String key = null List list = new ArrayList (16); int I = 0; List cells = row.getTableCells (); for (XWPFTableCell cell: cells) {/ / easy to get content (font alignment cannot be obtained in a simple way) StringBuffer sb = new StringBuffer () / / A cell can be understood as an word document, and the cell can also add paragraphs and table List paragraphs = cell.getParagraphs (); for (XWPFParagraph paragraph: paragraphs) {sb.append (DocUtils.getParagraphText (paragraph)) } if (I = = 0) {key = sb.toString ();} else {String value = sb.toString (); list.add (value = = null | | Objects.deepEquals (value, "")? Null: value.replace (",", ");} iTunes;} result.put (key, list);} return result } / * docx get paragraph string * get paragraph content * * @ param paragraph * / public static String getParagraphText (XWPFParagraph paragraph) {StringBuffer runText = new StringBuffer (); / / get all the content in the paragraph List runs = paragraph.getRuns (); if (runs.size () = = 0) {return runText.toString () } for (XWPFRun run: runs) {runText.append (run.text ());} return runText.toString ();} / * doc format field resolution table * @ param tb * @ return * / public static Map getTabelDocText (Table tb) {Map result = new HashMap (16) / iterative line, which starts from 0 by default. You can set the value of I to change the number of starting lines, or you can set the line to be read. You can for (int I = 0; I < tb.numRows (); iTunes +) {List list = new ArrayList (16); int x = 0; TableRow tr = tb.getRow (I). String key = null; / / iterative column, for starts from 0 by default (int j = 0; j < tr.numCells (); jacks +) {/ / get cell TableCell td = tr.getCell (j); StringBuffer sb = new StringBuffer () / / get the contents of the cell for (int k = 0; k < td.numParagraphs (); kits +) {Paragraph paragraph = td.getParagraph (k); String s = paragraph.text () / / remove the special symbol if (null! = s & &! ".equals (s)) {s = s.substring (0, s.length ()-1);} sb.append (s) } if (x = = 0) {key = sb.toString ();} else {String value = sb.toString (); list.add (value = = null | | Objects.deepEquals (value, ")? Null: value.replace (",", "));} result.put (key, list);} return result }} read here, this article "how to use Java to parse data in word documents with poi" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it yourself. If you want to know more about related articles, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.