Juggling between coding languages? Let our Code Converter help. Your one-stop solution for language conversion. Start now!
There are many purposes where you want to encrypt your PDF file, one of which is stopping someone from copying your PDF to their computer and making it usable only with a decryption key. With an encrypted PDF file, you can prevent unwanted parties from viewing personal or credential information within a PDF file.
In this tutorial, you will learn how to encrypt PDF files by applying two protection levels:
The purpose of this tutorial is to develop a lightweight command-line-based utility, through Python-based modules without relying on external utilities outside the Python ecosystem (e.g. qpdf) in order to secure PDF files in Python.
Download: Practical Python PDF Processing EBook.
Before getting started, let's install the required libraries:
$ pip install PyPDF4==1.27.0 pyAesCrypt==6.0.0
Let's import the necessary libraries in our Python file:
# Import Libraries
from PyPDF4 import PdfFileReader, PdfFileWriter, utils
import os
import argparse
import getpass
from io import BytesIO
import pyAesCrypt
First, let's define a function that checks whether the PDF file is encrypted:
# Size of chunck
BUFFER_SIZE = 64*1024
def is_encrypted(input_file: str) -> bool:
"""Checks if the inputted file is encrypted using PyPDF4 library"""
with open(input_file, 'rb') as pdf_file:
pdf_reader = PdfFileReader(pdf_file, strict=False)
return pdf_reader.isEncrypted
Second, let's make the core function, which is encrypting the PDF file:
def encrypt_pdf(input_file: str, password: str):
"""
Encrypts a file using PyPDF4 library.
Precondition: File is not encrypted.
"""
pdf_writer = PdfFileWriter()
pdf_reader = PdfFileReader(open(input_file, 'rb'), strict=False)
if pdf_reader.isEncrypted:
print(f"PDF File {input_file} already encrypted")
return False, None, None
try:
# To encrypt all the pages of the input file, you need to loop over all of them
# and to add them to the writer.
for page_number in range(pdf_reader.numPages):
pdf_writer.addPage(pdf_reader.getPage(page_number))
except utils.PdfReadError as e:
print(f"Error reading PDF File {input_file} = {e}")
return False, None, None
# The default is 128 bit encryption (if false then 40 bit encryption).
pdf_writer.encrypt(user_pwd=password, owner_pwd=None, use_128bit=True)
return True, pdf_reader, pdf_writer
The encrypt_pdf()
function performs the following:
pdf_writer
object.pdf_writer
object using a given password.Now that we have the function that is responsible for encryption, let's make the opposite, that's decryption:
def decrypt_pdf(input_file: str, password: str):
"""
Decrypts a file using PyPDF4 library.
Precondition: A file is already encrypted
"""
pdf_reader = PdfFileReader(open(input_file, 'rb'), strict=False)
if not pdf_reader.isEncrypted:
print(f"PDF File {input_file} not encrypted")
return False, None, None
pdf_reader.decrypt(password=password)
pdf_writer = PdfFileWriter()
try:
for page_number in range(pdf_reader.numPages):
pdf_writer.addPage(pdf_reader.getPage(page_number))
except utils.PdfReadError as e:
print(f"Error reading PDF File {input_file} = {e}")
return False, None, None
return True, pdf_reader, pdf_writer
This function performs the following:
pdf_reader
object using the password (must be the correct one).pdf_writer
object.Let's head to level 2, encrypting the actual file:
def cipher_stream(inp_buffer: BytesIO, password: str):
"""Ciphers an input memory buffer and returns a ciphered output memory buffer"""
# Initialize output ciphered binary stream
out_buffer = BytesIO()
inp_buffer.seek(0)
# Encrypt Stream
pyAesCrypt.encryptStream(inp_buffer, out_buffer, password, BUFFER_SIZE)
out_buffer.seek(0)
return out_buffer
By using the pyAesCrypt library, the above function encrypts an input memory buffer and returns an encrypted memory buffer as output.
Master PDF Manipulation with Python by building PDF tools from scratch. Get your copy now!
Download EBookLet's make the file decryption function now:
def decipher_file(input_file: str, output_file: str, password: str):
"""
Deciphers an input file and returns a deciphered output file
"""
inpFileSize = os.stat(input_file).st_size
out_buffer = BytesIO()
with open(input_file, mode='rb') as inp_buffer:
try:
# Decrypt Stream
pyAesCrypt.decryptStream(
inp_buffer, out_buffer, password, BUFFER_SIZE, inpFileSize)
except Exception as e:
print("Exception", str(e))
return False
inp_buffer.close()
if out_buffer:
with open(output_file, mode='wb') as f:
f.write(out_buffer.getbuffer())
f.close()
return True
In the decipher_file()
, we use the decryptStream()
method from pyAesCrypt module, which accepts input and output buffer, password, buffer size, and file size as parameters, and writes out the decrypted stream to the output buffer.
For more convenient use of encryption and decryption of files, I suggest you read this tutorial which uses the cryptography module that is more friendly to Python developers.
Now let's combine our functions into a single one:
def encrypt_decrypt_file(**kwargs):
"""Encrypts or decrypts a file"""
input_file = kwargs.get('input_file')
password = kwargs.get('password')
output_file = kwargs.get('output_file')
action = kwargs.get('action')
# Protection Level
# Level 1 --> Encryption / Decryption using PyPDF4
# Level 2 --> Encryption and Ciphering / Deciphering and Decryption
level = kwargs.get('level')
if not output_file:
output_file = input_file
if action == "encrypt":
result, pdf_reader, pdf_writer = encrypt_pdf(
input_file=input_file, password=password)
# Encryption completed successfully
if result:
output_buffer = BytesIO()
pdf_writer.write(output_buffer)
pdf_reader.stream.close()
if level == 2:
output_buffer = cipher_stream(output_buffer, password=password)
with open(output_file, mode='wb') as f:
f.write(output_buffer.getbuffer())
f.close()
elif action == "decrypt":
if level == 2:
decipher_file(input_file=input_file,
output_file=output_file, password=password)
result, pdf_reader, pdf_writer = decrypt_pdf(
input_file=input_file, password=password)
# Decryption completed successfully
if result:
output_buffer = BytesIO()
pdf_writer.write(output_buffer)
pdf_reader.stream.close()
with open(output_file, mode='wb') as f:
f.write(output_buffer.getbuffer())
f.close()
The above function accepts 5 keyword arguments:
input_file
: The input PDF file.output_file
: The output PDF file.password
: The password string you want to encrypt with.action
: Accepts "encrypt" or "decrypt" actions as string.level
: Which level of encryption do you want to use. Setting it to 1
means only adding a password during the opening of the PDF file, 2
adds file encryption as another layer of security.Now, let's create a new class that inherits from argparse.Action
to enter a password securely:
class Password(argparse.Action):
"""
Hides the password entry
"""
def __call__(self, parser, namespace, values, option_string):
if values is None:
values = getpass.getpass()
setattr(namespace, self.dest, values)
It overrides __call__()
method and sets the dest
variable of the namespace
object to the password that the user enters using the getpass module.
Next, let's define functions for parsing command-line arguments:
def is_valid_path(path):
"""Validates the path inputted and checks whether it is a file path or a folder path"""
if not path:
raise ValueError(f"Invalid Path")
if os.path.isfile(path):
return path
elif os.path.isdir(path):
return path
else:
raise ValueError(f"Invalid Path {path}")
def parse_args():
"""Get user command line parameters"""
parser = argparse.ArgumentParser(description="These options are available")
parser.add_argument("file", help="Input PDF file you want to encrypt", type=is_valid_path)
# parser.add_argument('-i', '--input_path', dest='input_path', type=is_valid_path,
# required=True, help="Enter the path of the file or the folder to process")
parser.add_argument('-a', '--action', dest='action', choices=[
'encrypt', 'decrypt'], type=str, default='encrypt', help="Choose whether to encrypt or to decrypt")
parser.add_argument('-l', '--level', dest='level', choices=[
1, 2], type=int, default=1, help="Choose which protection level to apply")
parser.add_argument('-p', '--password', dest='password', action=Password,
nargs='?', type=str, required=True, help="Enter a valid password")
parser.add_argument('-o', '--output_file', dest='output_file',
type=str, help="Enter a valid output file")
args = vars(parser.parse_args())
# To Display Command Arguments Except Password
print("## Command Arguments #################################################")
print("\n".join("{}:{}".format(i, j)
for i, j in args.items() if i != 'password'))
print("######################################################################")
return args
Finally, writing the main code:
if __name__ == '__main__':
# Parsing command line arguments entered by user
args = parse_args()
# Encrypting or Decrypting File
encrypt_decrypt_file(
input_file=args['file'], password=args['password'],
action=args['action'], level=args['level'], output_file=args['output_file']
)
Alright, let's test our program. First, let's pass --help
to see the arguments:
$ python encrypt_pdf.py --help
Output:
usage: encrypt_pdf.py [-h] [-a {encrypt,decrypt}] [-l {1,2}] -p [PASSWORD] [-o OUTPUT_FILE] file
These options are available
positional arguments:
file Input PDF file you want to encrypt
optional arguments:
-h, --help show this help message and exit
-a {encrypt,decrypt}, --action {encrypt,decrypt}
Choose whether to encrypt or to decrypt
-l {1,2}, --level {1,2}
Choose which protection level to apply
-p [PASSWORD], --password [PASSWORD]
Enter a valid password
-o OUTPUT_FILE, --output_file OUTPUT_FILE
Enter a valid output file
Awesome, let's encrypt an example PDF file (get it here):
$ python encrypt_pdf.py bert-paper.pdf -a encrypt -l 1 -p -o bert-paper-encrypted1.pdf
This will prompt for a password twice:
Password:
Password:
## Command Arguments #################################################
file:bert-paper.pdf
action:encrypt
level:1
output_file:bert-paper-encrypted1.pdf
######################################################################
A new PDF file that is secured with a password will appear in the current working directory, if you try to open it with any PDF reader program, you'll be prompted by a password, like shown in the below image:
Obviously, if you enter a wrong password, you won't be able to access the PDF file.
Next, let's decrypt it now:
$ python encrypt_pdf.py bert-paper-encrypted1.pdf -a decrypt -p -l 1 -o bert-paper-decrypted1.pdf
Output:
Password:
## Command Arguments #################################################
file:bert-paper-encrypted1.pdf
action:decrypt
level:1
output_file:bert-paper-decrypted1.pdf
######################################################################
Awesome, you'll notice the bert-paper-decrypted1.pdf appear in your directory that is equivalent to the original (not encrypted).
Notice that if you choose level 2, the entire file will be encrypted, so you need to decrypt it twice, first using level 2 and then level 1.
You need to be aware that locking a PDF file by adding the Document Open Password can be bypassed using a variety of methods, one of which is cracking the PDF password, check this tutorial for how to do it.
You can check the full code of this tutorial here.
Here are some related PDF tutorials:
Finally, for more PDF handling guides on Python, you can check our Practical Python PDF Processing EBook, where we dive deeper into PDF document manipulation with Python, make sure to check it out here if you're interested!
Happy coding ♥
Take the stress out of learning Python. Meet our Python Code Assistant – your new coding buddy. Give it a whirl!
View Full Code Auto-Generate My Code
Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!