Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to quickly split PDF documents with PyPDF2 in Python

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "Python how to use PyPDF2 to quickly split PDF documents", the content of the article is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "Python how to use PyPDF2 to quickly split PDF documents" bar!

This module is strictly case-sensitive. Y is lowercase and the rest is uppercase.

Pip3 install PyPDF2

After the installation is completed, create a special folder for this project on the local hard disk. The storage path here is F:PythonPyPDF2, and there is a Python folder on the F disk, in which a folder named after this module is created to store and distinguish it from other projects.

Create files and prepare PDF documents

Django official website downloaded his help document, this document is large enough, more than 1900 pages, absolutely enough for practice, if necessary, go to the official website to download, and then create a PDFCF.py project file.

Everything is ready to open.

At the beginning of the program, write the following two sentences, the first sentence means to specify the running program of this file, and the second sentence is a description of this file, the role of this is not yet clear, but if you know how to batch and quickly execute the program, you will know what it does, and I won't repeat it here.

#! The idea of splitting Program documents in python# PDFCF.py-pdf Files

Not fixed split into how many parts, but fixed each by how many pages, and then to dynamically calculate the number of split, split ideas, then the next is to list the calculation formula.

Number of copies split = total number of pages in the document / number of pages per pdf split

For example:

If we were to split an pdf document with a total of 35 pages and form a new document every 10 pages, the formula for calculating how many parts can be split is as follows:

3.5 = 35 / 10

At this time, everyone pay attention, except that there is not all the remainder 0.5, what does that mean? With this example is split into three and still have five pages left, then encounter this situation no matter how much the remainder has to move forward 1 in order to complete the entire split, the result of this document split is that the first three documents are each composed of 10 pages, the fourth document is composed of the last five pages, and the result is directly the number of split copies.

Python split calculation formula:

If 35% 10: # determine whether there is a remainder 35 / / 10 + 1 # add the integer part of the remainder plus 1else: 0 # if divisible, then directly return 0 # write this loop to one line 4 = 35 / / 10 + 1 if 35% 10 else 0 how to break it?

Take this 35-page document split as an example:

Iterate through each page of data for num in range (35) to get the data for each page, and then specify a range of split pages to split:

The first document is from 0 to 10 and does not contain 10.

The second document is from 10 to 20 and does not contain 20.

The third document ranges from 20 to 30 and does not contain 30.

The fourth document is from 30 murmur35, which does not contain 35

We find that the rule of traversing the first number each time is the number of pages of a document, multiplied by the number of pages it belongs to. The second number we found that there is no law, in fact, careful observation is also regular, if we sort the number of splits, this example is 1 Murray 4, the second number is the current number of split points multiplied by the number of pages composed of each document (the number of pages is fixed 10).

But the first time we traverse from 0, it makes num not universal, so let's modify it to start traversing from 1, range (1Power35), traversing from the beginning, based on the fact that range does not contain the last one, so there is one page missing from the traversal, so we add one to it and become

For num in range (1,359.1)

The first document starts from 10 * (1-1)-10 # 1 and does not contain 10.

The second document starts from 10 * (2-1)-10 * 2 and does not contain 20.

The third document ranges from 10 * (3-1) to 10 * 3 and does not contain 30

The fourth document is from 10 (4-1)-35.

The specific traversal code is as follows:

For num in range (1 else 35: 1): pass for i in range (10 * (num-1), 10 * num if num! = 4 else 35): pass

Note: when traversing to num = 4 (the last sorted number of documents), just return the total number of pages to 35, and the traversal is over here. Why is the total number of pages 35 instead of 35 pages 1? Because this time we start traversing from 0, the page number starts from 0, so there is no need to add 1.

Complete split program:

Import PyPDF2# opens a readable pdf object pdfReader = PyPDF2.PdfFileReader ("django.pdf") # to get the total number of pdf pages pdfnums = pdfReader.numPages# how many pages each split document consists of innumber = 10 split calculate the number of split shares outnums = pdfnums / / innumber + 1 if pdfnums% innumber else 0for num in range (1 Pdfnums): # create blank pdf pdfWriter = PyPDF2.PdfFileWriter () # extract specified page range for pageNum in range (innumber * (num-1)) Innumber * num if num! = outnums else pdfnums): # get the content of each page pageObj = pdfReader.getPage (pageNum) # add the content of each page to the blank document object created in the first loop pdfWriter.addPage (pageObj) # Save and write to the local file And rename each document with open ("PDFREAD% s"% num + ".pdf", "wb") as pdfOutputFile: pdfWriter.write (pdfOutputFile)

Note: I personally feel that the above split idea is rather round. if you have a thorough understanding of the concept of trimming and step size of the Python list, I don't think it needs to be so complicated, just generate a large list of the total page number, and then split the list into multiple small lists using the slicing method, and then the range of each split pdf page number is the first number of each small list-- the last number + 1. I also post the code I implemented with the list method for your reference.

Split PDF by list split method

#! Python# PDFCF.py-pdf file splitter import PyPDF2# import LISTCF# opens a readable pdf object pdfReader = PyPDF2.PdfFileReader ("django.pdf") # gets the total number of pdf pages pdfnums = pdfReader.numPages # loops the total page number into a list pagenum_list = list (range (pdfnums)) n = 10 divides the total page number into multiple small lists page_list = [pagenum_ list [I: I + n] for i in range (0) Len (pagenum_list), n)] for i in range (len (page_list)): # create a blank pdf pdfWriter = PyPDF2.PdfFileWriter () # extract the specified page for pageNum in range (page_ list [I] [1], page_ list [1] + 1): pageObj = pdfReader.getPage (pageNum) pdfWriter.addPage (pageObj) with open ("PDFREAD% s"% I + ".pdf", "wb") as pdfOutputFile: pdfWriter.write (pdfOutputFile)

How do I use it?

Hold down the shift key inside the project folder, click the right mouse button, choose to open the command window here, enter PDFCF.py, enter, and change the value of n according to your own needs.

Thank you for your reading, these are the contents of "how Python uses PyPDF2 to quickly split PDF documents". After the study of this article, I believe you have a deeper understanding of how Python uses PyPDF2 to quickly split PDF documents, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report