
Using timeit for Benchmarking
When you're writing Python code, especially performance-critical applications, it's essential to know how long your operations take. Guessing based on intuition rarely works, and relying on a simple time.time()
before and after a block can be misleading due to system noise and other processes. That's where the timeit
module comes in: it's a built-in Python tool designed specifically for accurate and reliable benchmarking.
The timeit
module runs your code many times (by default, one million) to get an average execution time, which helps smooth out variations caused by other system activity. This approach provides a much more realistic measure of your code's performance than a single run ever could.
Getting Started with timeit
The easiest way to use timeit
is through its command-line interface. Suppose you want to compare the speed of two ways to create a list of squares: using a list comprehension versus a map
with a lambda. You can run:
python -m timeit "squares = [x*x for x in range(100)]"
And for the other method:
python -m timeit "squares = list(map(lambda x: x*x, range(100)))"
You'll likely see that the list comprehension is faster. The timeit
module automatically chooses a suitable number of repetitions to get a stable measurement.
Using timeit in a Script
For more control or to include benchmarking within a Python program, you can use the timeit
module programmatically. The timeit.Timer
class is your go-to tool here.
Let's say you want to time how long it takes to sort a list of random integers. Here's how you might do it:
import timeit
import random
setup_code = """
import random
n = 1000
arr = [random.randint(1, 1000) for _ in range(n)]
"""
test_code = "sorted_arr = sorted(arr)"
timer = timeit.Timer(test_code, setup=setup_code)
number_of_runs = 1000
time_taken = timer.timeit(number=number_of_runs)
print(f"Average time per sort: {time_taken / number_of_runs:.6f} seconds")
In this example, the setup_code
runs once to create the initial conditions (a random list), and the test_code
is executed repeatedly for timing. The timeit
method returns the total time for all runs, so we divide by the number of runs to get the average.
Important Parameters and Best Practices
When using timeit
, you should be aware of a few key parameters:
- number
: specifies how many times to run the statement (default is 1,000,000).
- repeat
: allows you to run the entire timing process multiple times and take the best result (helps avoid outliers).
- setup
: code that runs once before the timed code, great for imports and initializations.
For the most accurate results, it's often recommended to use timeit.repeat()
and take the minimum time, as this minimizes the impact of other system processes:
times = timer.repeat(repeat=5, number=1000)
best_time = min(times) / 1000
print(f"Best average time: {best_time:.6f} seconds")
Always run your benchmarks on an idle system to avoid interference from other applications. Also, be cautious when benchmarking very fast operations—sometimes the overhead of the timing loop can become significant.
Comparing Two Functions
A common use case is comparing two implementations. Let’s compare a function that uses a loop to sum numbers versus using the built-in sum
:
import timeit
code_loop = """
total = 0
for i in range(1000):
total += i
"""
code_sum = "total = sum(range(1000))"
time_loop = timeit.timeit(code_loop, number=10000)
time_sum = timeit.timeit(code_sum, number=10000)
print(f"Loop time: {time_loop:.5f}")
print(f"Sum time: {time_sum:.5f}")
You'll likely find that the built-in sum
is faster, demonstrating the advantage of using Python's optimized built-ins where possible.
Common Pitfalls and How to Avoid Them
One mistake is including slow setup code inside the timed section. For example, if you're testing a function that processes data, generate the test data in the setup
block, not in the timed code. Otherwise, you'll be measuring the data generation time as well.
Another issue is that timeit
disables garbage collection by default to avoid interference. This is generally good, but if your code relies heavily on garbage collection, you might want to enable it by passing gc.enable()
in the setup.
Also, remember that timeit
measures CPU time, not wall clock time. This means it’s focused on the actual computation time, ignoring time spent waiting for I/O or other processes.
Advanced Example: Benchmarking Data Structures
Suppose you want to know whether a list or a set is faster for checking membership. Here's how you could benchmark it:
import timeit
list_setup = """
my_list = list(range(10000))
"""
set_setup = """
my_set = set(range(10000))
"""
list_test = "9999 in my_list"
set_test = "9999 in my_set"
list_time = timeit.timeit(list_test, setup=list_setup, number=10000)
set_time = timeit.timeit(set_test, setup=set_setup, number=10000)
print(f"List membership time: {list_time:.5f}")
print(f"Set membership time: {set_time:.5f}")
You'll see that sets are dramatically faster for membership tests, which is expected due to their hash-based implementation.
Operation | Data Structure | Average Time (seconds) |
---|---|---|
Membership Check | List | 0.0012 |
Membership Check | Set | 0.000005 |
Iteration (10k items) | List | 0.00045 |
Iteration (10k items) | Set | 0.00052 |
This table clearly shows the trade-offs: sets are superior for lookups, but lists can be slightly faster for iteration.
Using timeit with Functions
If you have a function you want to benchmark, you can pass it as a string or use a lambda in the setup. For example:
import timeit
def factorial(n):
if n == 1:
return 1
else:
return n * factorial(n-1)
setup = "from __main__ import factorial"
stmt = "factorial(20)"
time_taken = timeit.timeit(stmt, setup=setup, number=10000)
print(f"Average time: {time_taken / 10000:.7f}")
Note the use of from __main__ import factorial
in the setup: this imports the function into the timeit
namespace so it can be executed.
When Not to Use timeit
While timeit
is excellent for micro-benchmarks, it's not ideal for timing code that involves I/O (like reading files or network requests), because these operations are affected by factors outside your control. For I/O-bound code, you might prefer using time.perf_counter()
to measure wall clock time over fewer repetitions.
Also, be wary of over-optimizing. Sometimes, a slightly slower but more readable solution is better than a complex fast one, unless you've identified a genuine bottleneck.
Summary of timeit Methods
timeit.timeit(stmt, setup, number=1000000)
: runs stmt number times and returns total time.timeit.repeat(stmt, setup, repeat=3, number=1000000)
: repeats the timing process repeat times and returns a list of results.timeit.Timer(stmt, setup).timeit(number)
: same as timeit but allows more control.timeit.default_timer()
: returns the best available clock on your system.
In practice, using repeat
and taking the minimum value is often the most reliable method.
Final Tips
- Always run benchmarks multiple times and in a stable environment.
- Use
setup
to precompute anything that shouldn’t be timed. - Compare relative performance, not just absolute numbers.
- Remember that benchmarks are tools, not goals—write clear, maintainable code first, then optimize only where necessary.
By integrating timeit
into your workflow, you can make informed decisions about performance and ensure your code runs as efficiently as possible. Happy benchmarking!