Confused by complex code? Let our AI-powered Code Explainer demystify it for you. Try it out!
A compressed file is a sort of archive that contains one or more files that have been reduced in size. Compressing files in modern operating systems is usually pretty simple. However, in this tutorial, you will learn how to compress and decompress files using the Python programming language.
You may ask, why would I learn to compress files in Python where tools are already provided? Well, decompressing files programmatically without any manual clicks is extremely useful. For example, when downloading machine learning datasets, you want a piece of code to download, extract, and load them into memory automatically.
You may also want to add a compression/decompression feature in your application, or you have thousands of compressed files and want to decompress them in one click, this tutorial can help.
Related: How to Encrypt and Decrypt Files in Python.
Let's get started; we will be using the tarfile built-in module, so we don't have to install anything; you can optionally install tqdm just for printing progress bars:
pip3 install tqdm
Open up a new Python file and:
import tarfile
from tqdm import tqdm # pip3 install tqdm
Let's start with compression. The following function is responsible for compressing a file/folder or a list of files/folders:
def compress(tar_file, members):
"""
Adds files (`members`) to a tar_file and compress it
"""
# open file for gzip compressed writing
tar = tarfile.open(tar_file, mode="w:gz")
# with progress bar
# set the progress bar
progress = tqdm(members)
for member in progress:
# add file/folder/link to the tar file (compress)
tar.add(member)
# set the progress description of the progress bar
progress.set_description(f"Compressing {member}")
# close the file
tar.close()
I called these files/folders as members, well that's what the documentation calls them anyway.
First, we opened and created a new tar file for gzip-compressed writing (that's what mode='w:gz' stands for), and then for each member, add it to the archive and then finally close the tar file.
I've optionally wrapped members with tqdm to print progress bars; this will be useful when compressing a lot of files in one go.
That's it for compression, now let's dive into decompression.
Learn also: How to Compress PDF Files in Python.
The below function is for decompressing a given archive file:
def decompress(tar_file, path, members=None):
"""
Extracts `tar_file` and puts the `members` to `path`.
If members is None, all members on `tar_file` will be extracted.
"""
tar = tarfile.open(tar_file, mode="r:gz")
if members is None:
members = tar.getmembers()
# with progress bar
# set the progress bar
progress = tqdm(members)
for member in progress:
tar.extract(member, path=path)
# set the progress description of the progress bar
progress.set_description(f"Extracting {member.name}")
# or use this
# tar.extractall(members=members, path=path)
# close the file
tar.close()
First, we opened the archive file as reading with gzip compression. After that, I made an optional parameter 'member' in case we want to extract specific files (not all archives), if 'members' isn't specified, we gonna get all files in the archive using the getmembers() method which returns all the members of the archive as a Python list.
And then, for each member, extract it using the extract() method, which extracts a member from the archive to the 'path' directory we specified.
Note that we can alternatively use the extractall() for that (which is preferred in the official documentation).
Let's test this:
compress("compressed.tar.gz", ["test.txt", "folder"])
This will compress the test.txt file and folder in the current directory to a new tar archive file called compressed.tar.gz as shown in the following example figure:
If you want to decompress:
decompress("compressed.tar.gz", "extracted")
This will decompress the previous archive we just compressed to a new folder called extracted:
Okay, we are done! You can be creative with this; here are some ideas:
In this tutorial, we have explored compression and decompression using tarfile module, you can also use zipfile module to work with ZIP archives, bz2 module for bzip2 compressions, gzip, or zlib modules for gzip files.
Learn Also: How to Generate and Read QR Code in Python.
Happy Coding ♥
Why juggle between languages when you can convert? Check out our Code Converter. Try it out today!
View Full Code Analyze My Code
Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!