
Handling Exceptions in File Compression (zip, gzip)
Working with compressed files—like those in ZIP or GZIP formats—is a common task for Python developers. Whether you’re bundling files for distribution, saving space, or processing logs, you’ll likely use Python’s built-in modules like zipfile
or gzip
. But what happens when things go wrong? From missing files to broken archives, plenty can go awry. That's why exception handling is essential.
In this article, you’ll learn how to gracefully handle errors when compressing and decompressing files. We’ll cover both zipfile
and gzip
, give you practical code samples, and highlight common pitfalls so you can write robust file compression code.
Why Handle Exceptions in Compression?
File operations are inherently risky. The filesystem is a shared resource, and files can be moved, deleted, corrupted, or locked by other processes at any time. Without proper error handling, your script might crash unexpectedly or—even worse—corrupt data silently.
When dealing with compressed files, some typical issues you might face include:
- Trying to open a file that doesn’t exist.
- Attempting to read from or write to a file without proper permissions.
- Working with a corrupted archive.
- Running out of disk space during compression or extraction.
- Providing an incorrect file path or filename.
By anticipating these situations and catching exceptions, you ensure your program remains stable and provides helpful feedback to users.
Working with ZIP Files
The zipfile
module is Python’s standard tool for working with ZIP archives. It allows you to create, read, write, and extract ZIP files. Let’s look at common exceptions and how to handle them.
Common Exceptions in zipfile
When using zipfile
, you might encounter these exceptions:
FileNotFoundError
: The specified file does not exist.PermissionError
: You don’t have the required access rights.zipfile.BadZipFile
: The file is not a valid ZIP archive or is corrupted.RuntimeError
: Often raised for issues like insufficient disk space.ValueError
: For invalid arguments, like an unsupported compression method.
Here’s a simple example of creating a ZIP file with exception handling:
import zipfile
import os
def create_zip(zip_name, files_to_zip):
try:
with zipfile.ZipFile(zip_name, 'w') as zipf:
for file in files_to_zip:
zipf.write(file)
print(f"Successfully created {zip_name}")
except FileNotFoundError as e:
print(f"File not found: {e.filename}")
except PermissionError:
print("Permission denied: cannot create the ZIP file.")
except OSError as e:
print(f"OS error occurred: {e}")
except Exception as e:
print(f"An unexpected error occurred: {e}")
files = ['data.txt', 'config.ini']
create_zip('archive.zip', files)
In this example, we catch specific exceptions like FileNotFoundError
and PermissionError
, then handle more general errors with OSError
and a fallback Exception
. This ensures we cover a broad range of issues.
Exception Type | Common Cause |
---|---|
FileNotFoundError | The file to add or the ZIP path doesn’t exist. |
PermissionError | Lack of write permissions in the directory. |
zipfile.BadZipFile | The file is corrupted or not a ZIP. |
OSError | Disk full or path too long. |
Now, let’s look at extracting files from a ZIP archive:
def extract_zip(zip_name, extract_path):
try:
with zipfile.ZipFile(zip_name, 'r') as zipf:
zipf.extractall(extract_path)
print(f"Extracted {zip_name} to {extract_path}")
except zipfile.BadZipFile:
print("Error: The file is not a valid ZIP archive.")
except FileNotFoundError:
print("Error: ZIP file not found.")
except PermissionError:
print("Error: Permission denied for extraction.")
except Exception as e:
print(f"An unexpected error occurred: {e}")
extract_zip('archive.zip', './extracted')
Notice how we specifically catch zipfile.BadZipFile
. This is a common exception when dealing with user-provided or downloaded ZIP files that might be incomplete or tampered with.
Best Practices for ZIP Handling
When working with ZIP files, follow these guidelines to write safer code:
- Always use context managers (the
with
statement) to ensure files are properly closed, even if an error occurs. - Check if the file exists before processing, especially if dealing with user input.
- Validate filenames and paths to avoid directory traversal attacks or invalid characters.
- Handle disk space issues by checking available space beforehand when dealing with large files.
Here’s an enhanced version that checks for disk space before extraction:
import shutil
def safe_extract_zip(zip_name, extract_path):
try:
# Check available disk space
total, used, free = shutil.disk_usage(extract_path)
with zipfile.ZipFile(zip_name, 'r') as zipf:
total_size = sum(f.file_size for f in zipf.infolist())
if total_size > free:
raise OSError("Not enough disk space for extraction.")
zipf.extractall(extract_path)
print("Extraction successful.")
except OSError as e:
print(f"Disk error: {e}")
except zipfile.BadZipFile:
print("Invalid ZIP file.")
except Exception as e:
print(f"Error: {e}")
safe_extract_zip('large_archive.zip', '/data')
This function uses shutil.disk_usage
to check if there’s enough free space before extracting, helping you avoid OSError
due to a full disk.
Working with GZIP Files
The gzip
module provides a simple interface for compressing and decompressing single files using the GZIP format. Unlike ZIP, which can contain multiple files, GZIP is typically used for compressing individual files or streams.
Common Exceptions in gzip
While using gzip
, you might run into:
FileNotFoundError
: The file to compress or decompress doesn’t exist.PermissionError
: Lack of read/write permissions.OSError
: Often related to disk full or invalid paths.gzip.BadGzipFile
: The file is not a valid GZIP archive.
Here’s an example of compressing a file with gzip
:
import gzip
import shutil
def compress_with_gzip(source_file, gzip_file):
try:
with open(source_file, 'rb') as src:
with gzip.open(gzip_file, 'wb') as dest:
shutil.copyfileobj(src, dest)
print(f"Compressed {source_file} to {gzip_file}")
except FileNotFoundError:
print("Source file not found.")
except PermissionError:
print("Permission denied.")
except OSError as e:
print(f"System error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
compress_with_gzip('data.log', 'data.log.gz')
And here’s how to decompress:
def decompress_gzip(gzip_file, output_file):
try:
with gzip.open(gzip_file, 'rb') as src:
with open(output_file, 'wb') as dest:
shutil.copyfileobj(src, dest)
print(f"Decompressed {gzip_file} to {output_file}")
except FileNotFoundError:
print("GZIP file not found.")
except PermissionError:
print("Permission denied for writing output.")
except gzip.BadGzipFile:
print("Invalid or corrupted GZIP file.")
except Exception as e:
print(f"Error: {e}")
decompress_gzip('data.log.gz', 'restored.log')
Notice that we catch gzip.BadGzipFile
separately. This is important because it signals that the input file isn’t a valid GZIP file, which is a common issue when dealing with downloaded or user-provided content.
Handling Large Files and Memory Issues
When working with large files, it’s best to process them in chunks to avoid memory errors. Both zipfile
and gzip
support streaming, but you need to use them carefully.
For GZIP, you can read and write in chunks like this:
def compress_large_file(source, dest_gz, chunk_size=8192):
try:
with open(source, 'rb') as src:
with gzip.open(dest_gz, 'wb') as dest:
while True:
chunk = src.read(chunk_size)
if not chunk:
break
dest.write(chunk)
print("Compression successful.")
except IOError as e:
print(f"I/O error: {e}")
except Exception as e:
print(f"Error: {e}")
This method reads the file in small chunks, so it uses minimal memory even for very large files.
Advanced Exception Handling Techniques
Sometimes, you need more granular control over exceptions. Here are a few advanced tips.
Retrying on Temporary Errors
Network timeouts or temporary locks might cause intermittent errors. You can use a retry mechanism for such cases.
import time
from zipfile import ZipFile, BadZipFile
def robust_extract(zip_path, extract_to, retries=3):
for attempt in range(retries):
try:
with ZipFile(zip_path, 'r') as zipf:
zipf.extractall(extract_to)
print("Extraction done.")
break
except PermissionError:
if attempt < retries - 1:
print("Retrying after permission error...")
time.sleep(2)
else:
print("Max retries reached. Permission denied.")
except BadZipFile:
print("Bad ZIP file. Aborting.")
break
except Exception as e:
print(f"Attempt {attempt+1} failed: {e}")
if attempt == retries - 1:
print("All retries exhausted.")
time.sleep(1)
This function will retry on PermissionError
(which might be temporary) but gives up immediately on BadZipFile
(which is likely permanent).
Logging Exceptions
In production code, you’ll want to log exceptions for debugging. Here’s how you might do that:
import logging
logging.basicConfig(filename='compression.log', level=logging.ERROR)
def logged_compress(source, target):
try:
with open(source, 'rb') as f_in:
with gzip.open(target, 'wb') as f_out:
f_out.writelines(f_in)
except Exception as e:
logging.error(f"Failed to compress {source} to {target}: {e}", exc_info=True)
raise
logged_compress('app.log', 'app.log.gz')
Using exc_info=True
in the log message includes the full traceback, which is invaluable for debugging.
Common Pitfalls and How to Avoid Them
Even with exception handling, you can run into issues if you’re not careful. Here are some common mistakes and how to sidestep them.
- Not closing files properly: Always use context managers (
with
blocks) to avoid resource leaks. - Ignoring hidden exceptions: Catch specific exceptions first, then general ones. This prevents masking specific errors with broad catches.
- Overwriting files silently: Check if the output file exists and confirm overwrite if needed, especially in interactive tools.
- Assuming UTF-8 filenames in ZIP: ZIP archives can contain non-UTF-8 filenames. Use
ZipFile.infolist()
and handle encoding issues.
For example, to safely handle non-ASCII filenames in ZIP:
with zipfile.ZipFile('archive.zip', 'r') as zipf:
for info in zipf.infolist():
try:
filename = info.filename.encode('cp437').decode('utf-8')
except UnicodeDecodeError:
filename = info.filename # fallback
print(f"Extracting {filename}")
This tries to decode using a common encoding (CP437) and falls back to the raw name if that fails.
Summary of Key Points
Handling exceptions in file compression is all about anticipating what can go wrong and writing code that responds gracefully. Whether you’re using zipfile
or gzip
, the principles are similar: catch specific exceptions, provide helpful error messages, and ensure resources are cleaned up properly.
Remember these key takeaways:
- Use context managers for automatic resource cleanup.
- Catch specific exceptions like
FileNotFoundError
,PermissionError
, and format-specific errors likeBadZipFile
orBadGzipFile
. - Validate inputs and check disk space when working with large files.
- Log errors for production applications to aid in debugging.
By following these practices, you’ll write more reliable and user-friendly compression utilities in Python.