Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use java to find the page number of PDF keyword and its coordinates

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article is about how to use java to find the page number of the PDF keyword and its coordinates. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

1. Because there is a demand in this area recently, record it after using it.

2. This function is the same as the Ctrl+F in PDF. If the image in PDF is not supported, the keyword is located.

Import com.itextpdf.awt.geom.Rectangle2D.Float;import com.itextpdf.text.pdf.PdfDictionary;import com.itextpdf.text.pdf.PdfName;import com.itextpdf.text.pdf.PdfReader;import com.itextpdf.text.pdf.parser.*;import java.io.File;import java.io.FileInputStream;import java.io.IOException;import java.util.ArrayList;import java.util.List;/** * lost Sun * / public class MyTest {public static void main (String [] args) throws IOException {/ / 1. Given file File pdfFile = new File ("D://test.pdf"); / / 2. Define a byte array whose length is byte [] pdfData = new byte [(int) pdfFile.length ()]; / / the 3.IO stream reads the contents of the file to the byte array FileInputStream inputStream = null; try {inputStream = new FileInputStream (pdfFile); inputStream.read (pdfData);} catch (IOException e) {throw e;} finally {if (inputStream! = null) {try {inputStream.close () } catch (IOException e) {}} / / 4. Specify the keyword String keyword = "lost sun:"; / / 5. Call the method, given the keyword and file List positions = findKeywordPostions (pdfData, keyword); / / 6. The return value type is that each list element of List represents a matching position, which is the y-axis System.out.println ("total:" + positions.size ()) where the x-axis float [2] is located on the page number float [0]; if (positions! = null & & positions.size () > 0) {for (float [] position: positions) {System.out.print ("pageNum:" + (int) position [0])) System.out.print ("\ tx:" + position [1]); System.out.println ("\ ty:" + position [2]) } / * findKeywordPostions * @ param pdfData byte array * @ param keyword keyword * @ return List: float [0]: pageNum float [1]: x float [2]: y * @ throws IOException * / public static List findKeywordPostions (byte [] pdfData, String keyword) throws IOException {List result = new ArrayList (); List pdfPageContentPositions = getPdfContentPostionsList (pdfData) For (PdfPageContentPositions pdfPageContentPosition: pdfPageContentPositions) {List charPositions = findPositions (keyword, pdfPageContentPosition); if (charPositions = = null | | charPositions.size () < 1) {continue;} result.addAll (charPositions);} return result;} private static List getPdfContentPostionsList (byte [] pdfData) throws IOException {PdfReader reader = new PdfReader (pdfData); List result = new ArrayList (); int pages = reader.getNumberOfPages (); for (int pageNum = 1) PageNum 1) {word = word.substring (word.length ()-1, word.length ());} Float rectangle = textRenderInfo.getAscentLine (). GetBoundingRectange (); float x = (float) rectangle.getX (); float y = (float) rectangle.getY (); / / float x = (float) rectangle.getCenterX (); / float y = (float) rectangle.getCenterY (); / / double x = rectangle.getMinX () / / double y = rectangle.getMaxY (); / / these are the percentages of keywords on the XY axis of the page float xPercent = Math.round (x / pageWidth * 10000) / 10000f; float yPercent = Math.round ((1-y / pageHeight) * 10000) / 10000f / CharPosition charPosition = new CharPosition (pageNum, xPercent, yPercent); CharPosition charPosition = new CharPosition (pageNum, (float) x, (float) y); charPositions.add (charPosition) ContentBuilder.append (word);} public void endTextBlock () {} public void renderImage (ImageRenderInfo renderInfo) {} public String getContent () {return contentBuilder.toString ();} public List getcharPositions () {return charPositions;}} private static class CharPosition {private int pageNum = 0; private float x = 0; private float y = 0; public CharPosition (int pageNum, float x, float y) {this.pageNum = pageNum; this.x = x; this.y = y } public int getPageNum () {return pageNum;} public float getX () {return x;} public float getY () {return y;} @ Override public String toString () {return "[pageNum=" + this.pageNum + ", x =" + this.x + ", y =" + this.y + "]";} Thank you for reading! On "how to use java to find the page number and coordinates of PDF keywords" this article is shared here, I hope the above content can be of some help to you, so that you can learn more knowledge, if you think the article is good, you can share it out for more people to see it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report