How to Split PDF Files in Python

Abdeladim Fadheli · 5 min read · Updated jun 2023 · PDF File Handling

Confused by complex code? Let our AI-powered Code Explainer demystify it for you. Try it out!

There are many scenarios where you want to split a PDF document into several files automatically, from invoices, to official company reports and documents.

In a previous tutorial, we saw how you can merge multiple PDF documents into one. In this tutorial, you will learn how you can split PDF documents with Python using the pikepdf library.

Download: Practical Python PDF Processing EBook.

To get started, let's install pikepdf:

$ pip install pikepdf

Open up a new Python file and let's import it:

import os
from pikepdf import Pdf

First of all, let's make a Python dictionary that maps the new PDF file index with the original PDF file's page range:

# a dictionary mapping PDF file to original PDF's page range
file2pages = {
    0: [0, 9], # 1st splitted PDF file will contain the pages from 0 to 9 (9 is not included)
    1: [9, 11], # 2nd splitted PDF file will contain the pages from 9 (9 is included) to 11
    2: [11, 100], # 3rd splitted PDF file will contain the pages from 11 until the end or until the 100th page (if exists)
}

In the above setting, we're going to split our PDF file into 3 new PDF documents, the first contains the first 9 pages, from 0 to 9 (while 9 is not included). The second file will contain the pages from 9 (included) to 11, and the last file will contain the page range from 11 until the end or until reaching page 100 if it exists.

This way, we assure maximum flexibility as each one of you has its own use case. If you want to split each page into a new PDF document, you can simply replace [0, 9] to [0], so it'll be a list of one element and that is the first page, and so on.

This is the file we're going to split (you can get it here if you want to follow along):

# the target PDF document to split
filename = "bert-paper.pdf"

Loading the file:

# load the PDF file
pdf = Pdf.open(filename)

Next, we make the resulting PDF files (3 in this case) as a list:

# make the new splitted PDF files
new_pdf_files = [ Pdf.new() for i in file2pages ]
# the current pdf file index
new_pdf_index = 0

To make a new PDF file, you simply call the Pdf.new() method. The new_pdf_index variable is the index of the file, it will only be incremented when we're done with making the previous file. Diving into the main loop:

# iterate over all PDF pages
for n, page in enumerate(pdf.pages):
    if n in list(range(*file2pages[new_pdf_index])):
        # add the `n` page to the `new_pdf_index` file
        new_pdf_files[new_pdf_index].pages.append(page)
        print(f"[*] Assigning Page {n} to the file {new_pdf_index}")
    else:
        # make a unique filename based on original file name plus the index
        name, ext = os.path.splitext(filename)
        output_filename = f"{name}-{new_pdf_index}.pdf"
        # save the PDF file
        new_pdf_files[new_pdf_index].save(output_filename)
        print(f"[+] File: {output_filename} saved.")
        # go to the next file
        new_pdf_index += 1
        # add the `n` page to the `new_pdf_index` file
        new_pdf_files[new_pdf_index].pages.append(page)
        print(f"[*] Assigning Page {n} to the file {new_pdf_index}")

# save the last PDF file
name, ext = os.path.splitext(filename)
output_filename = f"{name}-{new_pdf_index}.pdf"
new_pdf_files[new_pdf_index].save(output_filename)
print(f"[+] File: {output_filename} saved.")

Get Our Practical Python PDF Processing EBook

Master PDF Manipulation with Python by building PDF tools from scratch. Get your copy now!

Download EBook

First, we iterate over all the PDF files using the pdf.pages attribute. If the page index is in the file page range in the file2pages dictionary, then we simply add the page into our new file. Otherwise, then we know we're done with the previous file, and it is time to save it to the disk using save() method, and we continue the loop until all pages are assigned to their files. And then finally, we save the last file outside the loop.

Here's the output when I run the code:

[*] Assigning Page 0 to the file 0
[*] Assigning Page 1 to the file 0
[*] Assigning Page 2 to the file 0
[*] Assigning Page 3 to the file 0
[*] Assigning Page 4 to the file 0
[*] Assigning Page 5 to the file 0
[*] Assigning Page 6 to the file 0
[*] Assigning Page 7 to the file 0
[*] Assigning Page 8 to the file 0
[+] File: bert-paper-0.pdf saved.
[*] Assigning Page 9 to the file 1 
[*] Assigning Page 10 to the file 1
[+] File: bert-paper-1.pdf saved.
[*] Assigning Page 11 to the file 2
[*] Assigning Page 12 to the file 2
[*] Assigning Page 13 to the file 2
[*] Assigning Page 14 to the file 2
[*] Assigning Page 15 to the file 2
[+] File: bert-paper-2.pdf saved.

And indeed, the new PDF files are created:

Conclusion

And there you go! I hope this quick guide helped you out splitting your PDF file into several documents, you can check the full code here. If you want to merge several PDF files into one, then this tutorial will definitely help you.

Here are some PDF-related tutorials:

For more PDF handling guides on Python, you can check our Practical Python PDF Processing EBook, where we dive deeper into PDF document manipulation with Python, make sure to check it out here if you're interested!

Happy coding ♥

Let our Code Converter simplify your multi-language projects. It's like having a coding translator at your fingertips. Don't miss out!

View Full Code Analyze My Code

Sharing is caring!

Comment panel

Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!