Before we get started, have you tried our new Python Code Assistant? It's like having an expert coder at your fingertips. Check it out!
Let us assume that you got a password-protected PDF file and it's your top priority job to access it, but unfortunately, you overlooked the password. So, at this stage, you will look for an utmost way that can give you an instant result. In this tutorial, you will learn how to:
To get started, install the required dependencies:
pip3 install pikepdf tqdm
pikepdf is a Python library that allows us to create, manipulate and repair PDF files. It provides a Pythonic wrapper around the C++ QPDF library.
We won't be using pikepdf for that, though. We just gonna need to open the password-protected PDF file; if it succeeds, that means it's a correct password, and it'll raise a PasswordError
exception otherwise:
import pikepdf
from tqdm import tqdm
# load password list
passwords = [ line.strip() for line in open("wordlist.txt") ]
# iterate over passwords
for password in tqdm(passwords, "Decrypting PDF"):
try:
# open PDF file
with pikepdf.open("foo-protected.pdf", password=password) as pdf:
# Password decrypted successfully, break out of the loop
print("[+] Password found:", password)
break
except pikepdf._core.PasswordError as e:
# wrong password, just continue in the loop
continue
First, we load a password list from wordlist.txt
file in the current directory, get it here. You can use rockyou list or any other large wordlists as well. You can also use the Crunch tool to generate your own custom wordlist.
Next, we iterate over the list and try to open the file with each password, by passing password
argument to pikepdf.open()
method, this will raise pikepdf._qpdf.PasswordError
if it's an incorrect password.
We used tqdm here just to print the progress on how many words are remaining. Check out my result:
Decrypting PDF: 43%|████████████████████████████████████████▏ | 2137/5000 [00:06<00:08, 320.70it/s]
[+] Password found: abc123
The password was found after 2137 trials, which took about 6 seconds. As you can see, it's going for about 320 word/s; we'll see how to boost this rate.
John the Ripper is a free and fast password-cracking software tool that is available on many platforms. However, we'll be using Kali Linux operating system here, as it already comes pre-installed.
First, we gonna need a way to extract the password hash from the PDF file to be suitable for cracking in john utility. Luckily for us, there is a Python script pdf2john.py that does that. Let's download it:
Put your password-protected PDF in the current directory; mine is called foo-protected.pdf
, and run the following command:
root@rockikz:~/pdf-cracking# python3 pdf2john.py foo-protected.pdf | sed "s/::.*$//" | sed "s/^.*://" | sed -r 's/^.{2}//' | sed 's/.\{1\}$//' > hash
This will extract the PDF password hash into a new file named hash
, here is my result:
After I saved the password hash into hash
file, I used cat
command to print it to the screen.
Finally, we use this hash file to crack the password:
We simply use the command "john [hashfile]". As you can see, the password is 012345
and was found with the speed of 4503p/s.
For more information about cracking PDF documents with Linux, check this guide.
Related: How to Use Hashing Algorithms in Python using hashlib.
Not all users are comfortable with coding in Python or using commands in Linux. So, if you're looking for an effective PDF password cracking program on Windows, then iSeePassword Dr.PDF is one of the best choices.
This PDF password cracking has an easy-to-understand UI so even the novices know how to use this program. Besides, it offers three powerful password cracking algorithms, including Dictionary, Brute-force, and Brute-force with Mask. You're free to set several types of parameters to boost the performance.
Currently, the password cracking speed is up to 100K per second, making it one of the fastest programs for cracking PDF passwords.
So that's it, our job is done, and we have successfully cracked the PDF password using three methods: pikepdf, John The Ripper, and iSeePassword Dr.PDF. The first method takes a lot of time to break the password but is quite intuitive for Python programmers, whereas the other methods are the ultimate method to get the password of a PDF file in a short period of time. Make sure you use this for ethical and own use.
Finally, in our Ethical hacking with Python book, we have built 24 hacking tools (including password crackers) from scratch using Python. Make sure to check it out here if you're interested!
Learn also: How to Brute Force ZIP File Passwords in Python.
Happy Cracking ♥
Want to code smarter? Our Python Code Assistant is waiting to help you. Try it now!
View Full Code Assist My Coding
Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!