In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the "python how to extract table data from pdf documents" related knowledge, in the actual case of the operation process, many people will encounter such a dilemma, and then let the editor lead you to learn how to deal with these situations! I hope you can read it carefully and be able to achieve something!
Extract original text links to table data from pdf files
Camelot module is required to extract tabular data from pdf documents.
This module can be installed directly using pip
Pip install "camelot-py [cv]"
The pdf sample files used can be downloaded directly from the original link.
Http://gstcouncil.gov.in/sites/default/files/gst-revenue-collection-march3020.pdf
The first step is to read the pdf file
Import camelot
Tables = camelot.read_pdf ('gst-revenue-collection-march3020.pdf', flavor='stream', pages='0-3')
The function of the flavor parameter is not known yet.
If you need to specify the pages parameter if the form spread
Tables
Tables [2]
Tables [2] .df
Tables can return the number of tables obtained by parsing
Tables [2] gets the specified form
Tables [2] .df converts tabular data into data boxes
The append () method is used to merge the two data boxes in pandas according to the row merge.
Aa = {"A": [1rem 2je 3], "B": [4je 5je 6]}
Bb = {"A": [4], "B": [7]}
Import pandas as pd
A = pd.DataFrame (aa)
B = pd.DataFrame (bb)
A.append (b) convert SVG format to pdf format original text link
Https://www.tutorialexample.com/a-simple-guide-to-python-convert-svg-to-pdf-with-svglib-python-tutorial/
To achieve this function, you need to use the library svglib, which can be installed directly using pip.
Pip install svglib
Convert svg to pdf format code
From svglib.svglib import svg2rlg
From reportlab.graphics import renderPDF
Drawing = svg2rlg ("home.svg")
RenderPDF.drawToFile (drawing, "file.pdf") "how to extract tabular data from pdf documents by python" ends here. Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.