How to use PyPDF2 Module to split PDF documents in Python 12/18 Update SLTechnology News&Howtos

How to use PyPDF2 Module to split PDF documents in Python

2025-12-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "how to split PDF documents with PyPDF2 module in Python". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Install the PyPDF2 module

# this module is strictly case-sensitive. Y is lowercase and the rest is uppercase.

Pip3 install PyPDF2

After the installation is completed, create a special folder for this project on the local hard disk. My storage path here is F:\ Python\ PyPDF2. There is a Python folder on the F disk, and a folder named after this module is created to store it separately and distinguish it from other projects.

Create files and prepare PDF documents

Find a relatively large PDF document for practice, I downloaded his document on the Django official website, this document is big enough, more than 1900 pages, absolutely enough for practice, if necessary, go to the official website to download, or reply directly in my official account 'pdf' to get the download link, and then create a PDFCF.py project file.

Start writing.

At the beginning of the program, write the following two sentences, the first sentence means to specify the running program of this file, and the second sentence is a description of this file, the role of this is not yet clear, but if you know how to batch and quickly execute the program, you will know what it does, and I won't repeat it here.

#! Python# PDFCF.py-pdf file splitter

The idea of splitting documents

Not fixed split into how many parts, but fixed each by how many pages, and then to dynamically calculate the number of split, split ideas, then the next is to list the calculation formula.

Number of copies split = total number of pages in the document / number of pages per pdf split

For example:

If we were to split an pdf document with a total of 35 pages and form a new document every 10 pages, the formula for calculating how many parts can be split is as follows:

3.5 = 35 / 10

At this time, everyone pay attention, except that there is not all the remainder 0.5, what does that mean? With this example is split into three and still have five pages left, then encounter this situation no matter how much the remainder has to move forward 1 in order to complete the entire split, the result of this document split is that the first three documents are each composed of 10 pages, the fourth document is composed of the last five pages, and the result is directly the number of split copies.

Python split calculation formula:

If 35% 10: # determine whether there is a remainder 35 / / 10 + 1 # add 1else: 0 # divisible and directly return 0 # write this loop to line 4 = 35 / / 10 + 1 if 35% 10 else 0

How to dismantle it exactly?

Take this 35-page document split as an example:

Iterate through each page of data for num in range (35) to get the data for each page, and then specify a range of split pages to split:

The first document is from 0 to 10 and does not contain 10.

The second document is from 10 to 20 and does not contain 20.

The third document ranges from 20 to 30 and does not contain 30.

The fourth document is from 30 murmur35, which does not contain 35

We find that the rule of traversing the first number each time is the number of pages of a document, multiplied by the number of pages it belongs to. The second number we found that there is no law, in fact, careful observation is also regular, if we sort the number of splits, this example is 1 Murray 4, the second number is the current number of split points multiplied by the number of pages composed of each document (the number of pages is fixed 10).

But the first time we traverse from 0, it makes num not universal, so let's modify it to start traversing from 1, range (1Power35), traversing from the beginning, based on the fact that range does not contain the last one, so there is one page missing from the traversal, so we add one to it and become

For num in range (1)

The first document starts from 10 * (1-1)-10 # 1 and does not contain 10.

The second document starts from 10 * (2-1)-10 * 2 and does not contain 20.

The third document ranges from 10 * (3-1) to 10 * 3 and does not contain 30

The fourth document is from 10 (4-1)-35.

The specific traversal code is as follows:

For num in range (1 else 35: 1): pass for i in range (10 * (num-1), 10 * num if num! = 4 else 35): pass

Note: when traversing to num = 4 (the last sorted number of documents), just return the total number of pages to 35, and the traversal is over here. Why is the total number of pages 35 instead of 35 pages 1? Because this time we start traversing from 0, the page number starts from 0, so there is no need to add 1.

Complete split program:

Import PyPDF2

# Open a readable pdf object pdfReader = PyPDF2.PdfFileReader ('django.pdf') # get the total number of pdf pages pdfnums = pdfReader.numPages# how many pages each split document consists of innumber = 10 split calculate the number of split shares outnums = pdfnums / / innumber + 1 if pdfnums% innumber else 0

For num in range (1 innumber pdfnums): # create blank pdf pdfWriter = PyPDF2.PdfFileWriter () # extract specified page range for pageNum in range (innumber * (pdfnums-1)) Innumber * num if num! = outnums else pdfnums): # get the content of each page pageObj = pdfReader.getPage (pageNum) # add the content of each page to the blank document object created in the first loop pdfWriter.addPage (pageObj) # Save and write to the local file And rename each document with open ('PDFREAD% s'% num + '.pdf', 'wb') as pdfOutputFile: pdfWriter.write (pdfOutputFile)

Note: I personally feel that the above split idea is rather round. if you have a thorough understanding of the concept of trimming and step size of the Python list, I don't think it needs to be so complicated, just generate a large list of the total page number, and then split the list into multiple small lists using the slicing method, and then the range of each split pdf page number is the first number of each small list-- the last number + 1. I also post the code I implemented with the list method for your reference.

Split list method to split PDF:

#! Python# PDFCF.py-pdf file splitter

Import PyPDF2# import LISTCF

# Open a readable pdf object pdfReader = PyPDF2.PdfFileReader ('django.pdf') # get the total number of pdf pages pdfnums = pdfReader.numPages

# cycle the total page number into a list pagenum_list = list (range (pdfnums))

N = 100

# divide the total page number into multiple small lists according to the specified number page_list = [pagenum_ list [I: I + n] for i in range (0, len (pagenum_list), n)]

For i in range (len (page_list)): # create a blank pdf pdfWriter = PyPDF2.PdfFileWriter () # extract the specified page for pageNum in range (page_ list [I] [1], page_ list [I] [- 1] + 1): pageObj = pdfReader.getPage (pageNum) pdfWriter.addPage (pageObj)

With open ('PDFREAD% s'% I + '.pdf', 'wb') as pdfOutputFile: pdfWriter.write (pdfOutputFile)

How do I use it?

Hold down the shift key inside the project folder, click the right mouse button, choose to open the command window here, enter PDFCF.py, enter, and change the value of n according to your own needs.

This is the end of the introduction of "how to split PDF documents with PyPDF2 modules in Python". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.