Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to extract tabular data from pdf documents by python

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the "python how to extract table data from pdf documents" related knowledge, in the actual case of the operation process, many people will encounter such a dilemma, and then let the editor lead you to learn how to deal with these situations! I hope you can read it carefully and be able to achieve something!

Extract original text links to table data from pdf files

Camelot module is required to extract tabular data from pdf documents.

This module can be installed directly using pip

Pip install "camelot-py [cv]"

The pdf sample files used can be downloaded directly from the original link.

Http://gstcouncil.gov.in/sites/default/files/gst-revenue-collection-march3020.pdf

The first step is to read the pdf file

Import camelot

Tables = camelot.read_pdf ('gst-revenue-collection-march3020.pdf', flavor='stream', pages='0-3')

The function of the flavor parameter is not known yet.

If you need to specify the pages parameter if the form spread

Tables

Tables [2]

Tables [2] .df

Tables can return the number of tables obtained by parsing

Tables [2] gets the specified form

Tables [2] .df converts tabular data into data boxes

The append () method is used to merge the two data boxes in pandas according to the row merge.

Aa = {"A": [1rem 2je 3], "B": [4je 5je 6]}

Bb = {"A": [4], "B": [7]}

Import pandas as pd

A = pd.DataFrame (aa)

B = pd.DataFrame (bb)

A.append (b) convert SVG format to pdf format original text link

Https://www.tutorialexample.com/a-simple-guide-to-python-convert-svg-to-pdf-with-svglib-python-tutorial/

To achieve this function, you need to use the library svglib, which can be installed directly using pip.

Pip install svglib

Convert svg to pdf format code

From svglib.svglib import svg2rlg

From reportlab.graphics import renderPDF

Drawing = svg2rlg ("home.svg")

RenderPDF.drawToFile (drawing, "file.pdf") "how to extract tabular data from pdf documents by python" ends here. Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report