How to Verify File Integrity in Python

Learn to protect your downloads from corruption and cyberattacks. This guide teaches you how to verify file integrity using Python, ensuring that your files remain authentic and untampered.
  · 9 min read · Updated nov 2023 · Ethical Hacking · Python Standard Library · Cryptography

Struggling with multiple programming languages? No worries. Our Code Converter has got you covered. Give it a go!

In this tutorial, we will build a very important tool that can be used in our everyday lives. The importance of this program cannot be overemphasized. We’re going to be building a program that is capable of verifying the integrity and authenticity of a file after download.

When you download a file from the internet, it can be subject to data corruption during transmission. Verifying the hash ensures that the file is intact and hasn't been altered. Even a minor change to the file will result in a completely different hash value, alerting you to potential problems.

You may know what a man-in-the-middle-attack is. If you do not, a man-in-the-middle (MITM) attack is a type of cyberattack in which an attacker intercepts and possibly alters the communication between two parties without their knowledge or consent. This attack occurs when the attacker secretly positions themselves between the sender and the receiver of information, effectively eavesdropping on the communication and potentially manipulating the transmitted data. MITM attacks can be launched in various communication contexts, including over networks, websites, or other digital channels.

Here are some of our tutorials on MITM attacks:

A typical scenario is when you’re connected to a compromised Wi-Fi network  (at the airport, coffee shop, etc). The attackers who compromised the said network have the power to replace your downloads with malware. And believe me, by mere looking, you wouldn’t be able to detect that the file you downloaded isn’t what was intended. You would need a tool to check that the file you wanted to install is what you actually installed.

Verifying a file with its hash involves comparing the calculated hash value of the downloaded file with the provided hash value (by the vendors) to ensure its integrity and authenticity.

Now, let’s get into the implementation in Python. First off, we are going to install colorama. We can achieve this by running:

$ pip install colorama

Colorama is a Python library that simplifies adding colored output and text formatting to the command line or terminal. Next up, we import the necessary libraries:

# Import necessary libraries.
import argparse, hashlib, sys
# Import functions init and Fore from the colorama library.
from colorama import init, Fore

# Initialize colorama to enable colored terminal text.
init()

- argparse: is a Python library for parsing command-line arguments and options.

- hashlib: is a Python library for secure hash and message digest algorithms. You can check this tutorial for more information on how to use it.

- sys: is a Python library for providing access to system-specific parameters and functions.

Then, we create a function to calculate the hash of the downloaded file. This hash is what we’re going to use to compare with the hash that the vendors provide to check if the file is authentic or not:

# Define a function to calculate the SHA-256 hash of a file.
def calculate_hash(file_path):
   # Create a SHA-256 hash object.
   sha256_hash = hashlib.sha256()
   # Open the file in binary mode for reading (rb).
   with open(file_path, "rb") as file:
       # Read the file in 64KB chunks to efficiently handle large files.
       while True:
           data = file.read(65536)  # Read the file in 64KB chunks.
           if not data:
               break
           # Update the hash object with the data read from the file.
           sha256_hash.update(data)
   # Return the hexadecimal representation of the calculated hash.
   return sha256_hash.hexdigest()

Next, we create a function to verify the calculated hash against an expected hash. This function makes sure that the hash we calculated is the expected hash. However, if the hash provided by the vendor is not the same as the calculated hash, we know there’s a problem somewhere.

# Define a function to verify the calculated hash against an expected hash.
def verify_hash(downloaded_file, expected_hash):
   # Calculate the hash of the downloaded file.
   calculated_hash = calculate_hash(downloaded_file)
   # Compare the calculated hash with the expected hash and return the result.
   return calculated_hash == expected_hash

This is a CLI-based program. So, in this next part of the code, we will accept user input from the terminal. And you guessed right! We’re going to use argparse for that:

# Create a parser for handling command-line arguments.
parser = argparse.ArgumentParser(description="Verify the hash of a downloaded software file.")
# Define two command-line arguments:
# -f or --file: Path to the downloaded software file (required).
# --hash: Expected hash value (required).
parser.add_argument("-f", "--file", dest="downloaded_file", required=True, help="Path to the downloaded software file")
parser.add_argument("--hash", dest="expected_hash", required=True, help="Expected hash value")
# Parse the command-line arguments provided when running the script.
args = parser.parse_args()
# Check if the required command-line arguments were provided.
if not args.downloaded_file or not args.expected_hash:
   # Print an error message in red using colorama.
   print(f"{Fore.RED}[-] Please Specify the file to validate and its Hash.")
   # Exit the script.
   sys.exit()

What we did here was add flags to our programs. So the user can use -f or --file to specify the file to be validated and –-hash to specify the expected hash value of the file. By the way, -f and --hash are what we call flags.

Finally:

# Check if the hash of the file is accurate by calling the verify_hash function.
if verify_hash(args.downloaded_file, args.expected_hash):
   # If the hash is accurate, print a success message in green.
   print(f"{Fore.GREEN}[+] Hash verification successful. The software is authentic.")
else:
   # If the hash does not match, print an error message in red.
   print(f"{Fore.RED}[-] Hash verification failed. The software may have been tampered with or is not authentic.")

Here we checked (using the verify_hash() function) if the downloaded file is what we’re expecting. If it is, we say it is. If it’s not, we say it’s not (obviously).

There you have it! We’ve successfully built a simple but powerful script that we can use to verify our downloads to ensure integrity.

Now, let’s test our code. For this demonstration, I'll use the VLC media player. I’m using this because VLC is quite a popular media player. Even if you have it, it’s okay to download it for this demonstration, as you don’t have to install it to achieve what we want to do.

So head on to their website. You should see the following:

Click on the Download VLC button, and you should see the following:

After clicking the Display checksum button, you should see:

By the way, a checksum, in computing and data validation, is a value calculated from a data set that is used to check the integrity of the data. So, it’s literally what we will use to verify our download after completion.

Now that we have downloaded the software to test (VLC), let’s run our program. Please note that I already have my downloaded file in the same working directory as my Python file. You don’t need to do this. Just make sure when referencing the file to test, you specify the full path:

$ python file_integrity_verifier.py -f E:\Downloads\vlc-3.0.20-win32.exe --hash e197583514fa600f24a3b88cf6b24102c5c09dc39bad6ac9626bd55f23ff9def
[+] Hash verification successful. The software is authentic.

Here's a run where it fails (I just modified the hash value):

$ python file_integrity_verifier.py -f E:\Downloads\vlc-3.0.20-win32.exe --hash e197583514fa600f24a3b88cf6b24102c5c09dc39bad6ac9626bd55f23ff9dee
[-] Hash verification failed. The software may have been tampered with or is not authentic.

One more thing to note is that to run Python the way I did from the terminal (on Windows), you need to make sure Python is added to the PATH of your computer. If it’s not, all you need do is specify the full path to the python.exe file on your computer.

Other security measures to take to prevent data modification include:

1. Using HTTPS: Ensure that websites you visit use HTTPS for secure communication. Most modern browsers display a padlock symbol in the address bar to indicate a secure connection.

2. Verify Certificates: When visiting secure websites, pay attention to the SSL/TLS certificates. Check that the certificate's details match the website's domain. Be cautious if you receive browser warnings about certificate issues.

3. Public Wi-Fi: Avoid sensitive transactions or logging into accounts on public Wi-Fi networks. If you must use public Wi-Fi, consider using a virtual private network (VPN) to encrypt your connection.

4. Keep Software Updated: Regularly update your operating system, browser, and security software. These updates often include patches for security vulnerabilities.

5. Educate Yourself: Continuously educate yourself about common online threats and best practices for online security. Staying informed is crucial.

That’s it! In this tutorial, we were able to build a beneficial tool. I hope you enjoyed it. Check the complete code here.

Finally, in our Ethical Hacking with Python EBook, we've built over 39 hacking tools and scripts from scratch using Python! Check it out here if you're interested!

Learn also: How to Use Hashing Algorithms in Python using hashlib.

Happy coding ♥

Finished reading? Keep the learning going with our AI-powered Code Explainer. Try it now!

View Full Code Create Code for Me
Sharing is caring!



Read Also



Comment panel

    Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!