Confused by complex code? Let our AI-powered Code Explainer demystify it for you. Try it out!
The primary goal of merging PDF files is for proper file management, for archiving, bulk printing, or combining datasheets, e-books, and reports. You definitely need an efficient tool to merge small PDF files into a single PDF.
This tutorial is intended to show you how to merge a list of PDF files into a single PDF using the Python programming language. The combined PDF may include bookmarks to improve the navigation where every bookmark is linked to the content of one of the inputted PDF files.
We'll be using the PyPDF4 library for this purpose. PyPDF4 is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.
Download: Practical Python PDF Processing EBook.
Let's install it:
$ pip install PyPDF4==1.27.0
Importing the libraries:
#Import Libraries
from PyPDF4 import PdfFileMerger
import os,argparse
Let's define our core function:
def merge_pdfs(input_files: list, page_range: tuple, output_file: str, bookmark: bool = True):
"""
Merge a list of PDF files and save the combined result into the `output_file`.
`page_range` to select a range of pages (behaving like Python's range() function) from the input files
e.g (0,2) -> First 2 pages
e.g (0,6,2) -> pages 1,3,5
bookmark -> add bookmarks to the output file to navigate directly to the input file section within the output file.
"""
# strict = False -> To ignore PdfReadError - Illegal Character error
merger = PdfFileMerger(strict=False)
for input_file in input_files:
bookmark_name = os.path.splitext(os.path.basename(input_file))[0] if bookmark else None
# pages To control which pages are appended from a particular file.
merger.append(fileobj=open(input_file, 'rb'), pages=page_range, import_bookmarks=False, bookmark=bookmark_name)
# Insert the pdf at specific page
merger.write(fileobj=open(output_file, 'wb'))
merger.close()
So we first create a PDFFileMerger
object and then iterates over input_files
from the input. After that, for each input PDF file, we define a bookmark if required depending on the bookmark
variable and add it to the merger object taking into account the page_range
chosen.
Next, we use the append()
method from the merger to add our PDF file.
Finally, we write the output PDF file and close the object.
Let's now add a function to parse command-line arguments:
def parse_args():
"""Get user command line parameters"""
parser = argparse.ArgumentParser(description="Available Options")
parser.add_argument('-i', '--input_files', dest='input_files', nargs='*',
type=str, required=True, help="Enter the path of the files to process")
parser.add_argument('-p', '--page_range', dest='page_range', nargs='*',
help="Enter the pages to consider e.g.: (0,2) -> First 2 pages")
parser.add_argument('-o', '--output_file', dest='output_file',
required=True, type=str, help="Enter a valid output file")
parser.add_argument('-b', '--bookmark', dest='bookmark', default=True, type=lambda x: (
str(x).lower() in ['true', '1', 'yes']), help="Bookmark resulting file")
# To Porse The Command Line Arguments
args = vars(parser.parse_args())
# To Display The Command Line Arguments
print("## Command Arguments #################################################")
print("\n".join("{}:{}".format(i, j) for i, j in args.items()))
print("######################################################################")
return args
Master PDF Manipulation with Python by building PDF tools from scratch. Get your copy now!
Download EBookNow let's use the previously defined functions in our main code:
if __name__ == "__main__":
# Parsing command line arguments entered by user
args = parse_args()
page_range = None
if args['page_range']:
page_range = tuple(int(x) for x in args['page_range'][0].split(','))
# call the main function
merge_pdfs(
input_files=args['input_files'], page_range=page_range,
output_file=args['output_file'], bookmark=args['bookmark']
)
Alright, we're done with coding, let's test it out:
$ python pdf_merger.py --help
Output:
usage: pdf_merger.py [-h] -i [INPUT_FILES [INPUT_FILES ...]] [-p [PAGE_RANGE [PAGE_RANGE ...]]] -o OUTPUT_FILE [-b BOOKMARK]
Available Options
optional arguments:
-h, --help show this help message and exit
-i [INPUT_FILES [INPUT_FILES ...]], --input_files [INPUT_FILES [INPUT_FILES ...]]
Enter the path of the files to process
-p [PAGE_RANGE [PAGE_RANGE ...]], --page_range [PAGE_RANGE [PAGE_RANGE ...]]
Enter the pages to consider e.g.: (0,2) -> First 2 pages
-o OUTPUT_FILE, --output_file OUTPUT_FILE
Enter a valid output file
-b BOOKMARK, --bookmark BOOKMARK
Bookmark resulting file
Here is an example of merging two PDF files into one:
$ python pdf_merger.py -i bert-paper.pdf letter.pdf -o combined.pdf
You need to separate the input PDF files with a comma (,)
in the -i
argument, and you must not add any space.
A new combined.pdf
appeared in the current directory that contains both of the input PDF files, the output is:
## Command Arguments #################################################
input_files:['bert-paper.pdf', 'letter.pdf']
page_range:None
output_file:combined.pdf
bookmark:True
######################################################################
Make sure you use the right order of the input files when passing the -i
argument.
I hope this code helped you out in merging PDF files easily and without 3rd party or online tools, as using Python to perform such tasks is more convenient.
If you want to split PDF documents instead, this tutorial will certainly help you.
Check the full code here.
Here are some related Python tutorials:
Finally, for more PDF handling guides on Python, you can check our Practical Python PDF Processing EBook, where we dive deeper into PDF document manipulation with Python, make sure to check it out here if you're interested!
Happy coding ♥
Take the stress out of learning Python. Meet our Python Code Assistant – your new coding buddy. Give it a whirl!
View Full Code Understand My Code
Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!