How to Crack PDF Files in Python

Learn how you can use pikepdf, pdf2john and other tools to crack password protected PDF files in Python.
  · 6 min read · Updated apr 2023 · Ethical Hacking · PDF File Handling · Cryptography · Digital Forensics

Juggling between coding languages? Let our Code Converter help. Your one-stop solution for language conversion. Start now!

Let us assume that you got a password-protected PDF file and it's your top priority job to access it, but unfortunately, you overlooked the password. So, at this stage, you will look for an utmost way that can give you an instant result. In this tutorial, you will learn how to:

To get started, install the required dependencies:

pip3 install pikepdf tqdm

Related: Build 35+ Ethical Hacking Scripts & Tools with Python EBook

Cracking PDF Password using pikepdf

pikepdf is a Python library that allows us to create, manipulate and repair PDF files. It provides a Pythonic wrapper around the C++ QPDF library.

We won't be using pikepdf for that, though. We just gonna need to open the password-protected PDF file; if it succeeds, that means it's a correct password, and it'll raise a PasswordError exception otherwise:

import pikepdf
from tqdm import tqdm

# load password list
passwords = [ line.strip() for line in open("wordlist.txt") ]

# iterate over passwords
for password in tqdm(passwords, "Decrypting PDF"):
    try:
        # open PDF file
        with pikepdf.open("foo-protected.pdf", password=password) as pdf:
            # Password decrypted successfully, break out of the loop
            print("[+] Password found:", password)
            break
    except pikepdf._core.PasswordError as e:
        # wrong password, just continue in the loop
        continue

First, we load a password list from wordlist.txt file in the current directory, get it here. You can use rockyou list or any other large wordlists as well. You can also use the Crunch tool to generate your own custom wordlist.

Next, we iterate over the list and try to open the file with each password, by passing password argument to pikepdf.open() method, this will raise pikepdf._qpdf.PasswordError if it's an incorrect password.

We used tqdm here just to print the progress on how many words are remaining. Check out my result:

Decrypting PDF:  43%|████████████████████████████████████████▏                                                   | 2137/5000 [00:06<00:08, 320.70it/s]
[+] Password found: abc123

The password was found after 2137 trials, which took about 6 seconds. As you can see, it's going for about 320 word/s; we'll see how to boost this rate.

Get: Build 35+ Ethical Hacking Scripts & Tools with Python EBook

Cracking PDF Password using John The Ripper

John the Ripper is a free and fast password-cracking software tool that is available on many platforms. However, we'll be using Kali Linux operating system here, as it already comes pre-installed.

First, we gonna need a way to extract the password hash from the PDF file to be suitable for cracking in john utility. Luckily for us, there is a Python script pdf2john.py that does that. Let's download it:

Downloading pdf2john.pyPut your password-protected PDF in the current directory; mine is called foo-protected.pdf, and run the following command:

root@rockikz:~/pdf-cracking# python3 pdf2john.py foo-protected.pdf | sed "s/::.*$//" | sed "s/^.*://" | sed -r 's/^.{2}//' | sed 's/.\{1\}$//' > hash

This will extract the PDF password hash into a new file named hash, here is my result:

Extracting PDF password hash using pdf2johnAfter I saved the password hash into hash file, I used cat command to print it to the screen.

Finally, we use this hash file to crack the password:

Password cracked successfully using john the ripperWe simply use the command "john [hashfile]". As you can see, the password is 012345 and was found with the speed of 4503p/s.

For more information about cracking PDF documents with Linux, check this guide.

Related: How to Use Hashing Algorithms in Python using hashlib.

Cracking PDF Password using iSeePassword Dr.PDF

Not all users are comfortable with coding in Python or using commands in Linux. So, if you're looking for an effective PDF password cracking program on Windows, then iSeePassword Dr.PDF is one of the best choices.

Importing PDF file

This PDF password cracking has an easy-to-understand UI so even the novices know how to use this program. Besides, it offers three powerful password cracking algorithms, including Dictionary, Brute-force, and Brute-force with Mask. You're free to set several types of parameters to boost the performance.

Password found image

Currently, the password cracking speed is up to 100K per second, making it one of the fastest programs for cracking PDF passwords.

Conclusion

So that's it, our job is done, and we have successfully cracked the PDF password using three methods: pikepdf, John The Ripper, and iSeePassword Dr.PDF. The first method takes a lot of time to break the password but is quite intuitive for Python programmers, whereas the other methods are the ultimate method to get the password of a PDF file in a short period of time. Make sure you use this for ethical and own use.

Finally, in our Ethical hacking with Python book, we have built 24 hacking tools (including password crackers) from scratch using Python. Make sure to check it out here if you're interested!

Learn also: How to Brute Force ZIP File Passwords in Python.

Happy Cracking ♥

Let our Code Converter simplify your multi-language projects. It's like having a coding translator at your fingertips. Don't miss out!

View Full Code Switch My Framework
Sharing is caring!



Read Also



Comment panel

    Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!