
Python Generators Cheatsheet
Generators are a powerful feature in Python that allow you to iterate over data without storing it all in memory at once. They are essential for handling large datasets, streaming data, or creating efficient pipelines. In this cheatsheet, I’ll walk you through everything you need to know about generators, from the basics to advanced use cases.
What Are Generators?
Generators are functions that return an iterable generator object. They are defined like regular functions but use the yield statement instead of return. When the yield statement is encountered, the function’s state is paused and saved, and the value is returned to the caller. The next time the generator is called, it resumes from where it left off.
Here’s a simple example:
def count_up_to(n):
count = 1
while count <= n:
yield count
count += 1
counter = count_up_to(5)
for num in counter:
print(num)
This will output:
1
2
3
4
5
Each time the loop requests the next value, the generator function resumes execution until it hits yield again.
Generator Expressions
Just like list comprehensions, Python allows you to create generators using generator expressions. They are written similarly but with parentheses instead of square brackets.
squares = (x*x for x in range(5))
for sq in squares:
print(sq)
Output:
0
1
4
9
16
Generator expressions are memory efficient because they generate values on the fly.
Yielding From Another Generator
You can yield from another generator, which is useful for delegating to a sub-generator. This simplifies the code when working with nested generators.
def first_n_squares(n):
for i in range(n):
yield i * i
def first_n_cubes(n):
yield from first_n_squares(n)
for i in range(n):
yield i * i * i
for value in first_n_cubes(3):
print(value)
Output:
0
1
4
0
1
8
This approach makes your code cleaner and more modular.
Sending Data to Generators
Generators can also receive data using the send() method. This allows two-way communication between the caller and the generator.
def accumulator():
total = 0
while True:
value = yield total
if value is None:
break
total += value
acc = accumulator()
next(acc) # Start the generator
print(acc.send(10)) # Output: 10
print(acc.send(20)) # Output: 30
print(acc.send(5)) # Output: 35
acc.close()
This is particularly useful for coroutines and more advanced generator patterns.
Generator Methods
Generators have several built-in methods that you can use to control their execution:
- next(): Retrieves the next value.
- send(value): Sends a value into the generator.
- throw(type, value, traceback): Raises an exception inside the generator.
- close(): Stops the generator.
Here’s an example using throw():
def simple_generator():
try:
yield "Hello"
except ValueError:
yield "Caught an exception!"
gen = simple_generator()
print(next(gen)) # Output: Hello
print(gen.throw(ValueError)) # Output: Caught an exception!
This allows you to handle exceptions within the generator gracefully.
Common Use Cases
Generators are incredibly versatile. Here are some common scenarios where they shine:
- Reading large files: Instead of loading the entire file into memory, you can read it line by line.
- Generating infinite sequences: Such as an infinite counter or Fibonacci sequence.
- Data processing pipelines: Chaining generators to process data efficiently.
Example of reading a large file:
def read_large_file(file_path):
with open(file_path, 'r') as file:
for line in file:
yield line.strip()
for line in read_large_file('large_data.txt'):
process(line) #假设 process 是处理每一行的函数
This approach ensures that only one line is in memory at a time.
Performance Benefits
Generators are memory efficient because they generate items one at a time and do not store the entire sequence in memory. This makes them ideal for working with large datasets or streams.
Compare a list comprehension vs. a generator expression:
import sys
list_comp = [x for x in range(10000)]
gen_exp = (x for x in range(10000))
print(sys.getsizeof(list_comp)) # 大小可能在 85176 字节左右
print(sys.getsizeof(gen_exp)) # 大小约为 112 字节
As you can see, the generator expression uses significantly less memory.
Generator Best Practices
When working with generators, keep the following in mind:
- Use generators when dealing with large or infinite sequences.
- Avoid using generators if you need random access to elements; lists are better for that.
- Remember that generators are exhausted after iteration; you cannot iterate over them again without recreating.
- Use yield from to delegate to sub-generators for cleaner code.
Operation | List Comprehension | Generator Expression |
---|---|---|
Memory Usage | High | Low |
Execution Time | Faster access | Slower per item |
Reusability | Reusable | Single-use |
Use Case | Small to medium data | Large or streaming data |
Advanced Example: Pipeline Processing
You can chain generators to create data processing pipelines. Each generator handles a specific step, and the data flows through them efficiently.
def read_lines(file_name):
with open(file_name) as f:
for line in f:
yield line.strip()
def filter_lines(lines, pattern):
for line in lines:
if pattern in line:
yield line
def uppercase_lines(lines):
for line in lines:
yield line.upper()
lines = read_lines('data.txt')
filtered = filter_lines(lines, 'error')
uppercased = uppercase_lines(filtered)
for line in uppercased:
print(line)
This pipeline reads lines, filters for those containing 'error', and converts them to uppercase—all without loading the entire file into memory.
Generator vs. Iterator
It’s important to understand the difference between generators and iterators. All generators are iterators, but not all iterators are generators. Generators are a convenient way to create iterators without having to implement a class with iter() and next() methods.
Example of a custom iterator:
class Counter:
def __init__(self, low, high):
self.current = low
self.high = high
def __iter__(self):
return self
def __next__(self):
if self.current > self.high:
raise StopIteration
else:
self.current += 1
return self.current - 1
for num in Counter(1, 5):
print(num)
The generator version is much simpler:
def counter(low, high):
current = low
while current <= high:
yield current
current += 1
for num in counter(1, 5):
print(num)
Both output:
1
2
3
4
5
But the generator is more concise and easier to read.
Debugging Generators
Debugging generators can be tricky because of their stateful nature. Use print statements or logging to track the flow, or leverage debugging tools like pdb.
Example with debugging:
def debug_generator(n):
for i in range(n):
print(f"Yielding {i}")
yield i
print(f"Resumed after yielding {i}")
gen = debug_generator(3)
next(gen)
next(gen)
next(gen)
Output:
Yielding 0
Resumed after yielding 0
Yielding 1
Resumed after yielding 1
Yielding 2
Resumed after yielding 2
This helps you understand the pause and resume behavior.
Conclusion
Generators are a fundamental tool in Python for creating efficient, memory-friendly iterators. Whether you're processing large files, building data pipelines, or generating sequences, generators offer a clean and powerful solution. Remember to use them when memory efficiency is crucial, and leverage their ability to handle data lazily. Happy coding!