Python Generators Cheatsheet

Python Generators Cheatsheet

Generators are a powerful feature in Python that allow you to iterate over data without storing it all in memory at once. They are essential for handling large datasets, streaming data, or creating efficient pipelines. In this cheatsheet, I’ll walk you through everything you need to know about generators, from the basics to advanced use cases.

What Are Generators?

Generators are functions that return an iterable generator object. They are defined like regular functions but use the yield statement instead of return. When the yield statement is encountered, the function’s state is paused and saved, and the value is returned to the caller. The next time the generator is called, it resumes from where it left off.

Here’s a simple example:

def count_up_to(n):
    count = 1
    while count <= n:
        yield count
        count += 1

counter = count_up_to(5)
for num in counter:
    print(num)

This will output:

1
2
3
4
5

Each time the loop requests the next value, the generator function resumes execution until it hits yield again.

Generator Expressions

Just like list comprehensions, Python allows you to create generators using generator expressions. They are written similarly but with parentheses instead of square brackets.

squares = (x*x for x in range(5))
for sq in squares:
    print(sq)

Output:

0
1
4
9
16

Generator expressions are memory efficient because they generate values on the fly.

Yielding From Another Generator

You can yield from another generator, which is useful for delegating to a sub-generator. This simplifies the code when working with nested generators.

def first_n_squares(n):
    for i in range(n):
        yield i * i

def first_n_cubes(n):
    yield from first_n_squares(n)
    for i in range(n):
        yield i * i * i

for value in first_n_cubes(3):
    print(value)

Output:

0
1
4
0
1
8

This approach makes your code cleaner and more modular.

Sending Data to Generators

Generators can also receive data using the send() method. This allows two-way communication between the caller and the generator.

def accumulator():
    total = 0
    while True:
        value = yield total
        if value is None:
            break
        total += value

acc = accumulator()
next(acc)  # Start the generator
print(acc.send(10))  # Output: 10
print(acc.send(20))  # Output: 30
print(acc.send(5))   # Output: 35
acc.close()

This is particularly useful for coroutines and more advanced generator patterns.

Generator Methods

Generators have several built-in methods that you can use to control their execution:

  • next(): Retrieves the next value.
  • send(value): Sends a value into the generator.
  • throw(type, value, traceback): Raises an exception inside the generator.
  • close(): Stops the generator.

Here’s an example using throw():

def simple_generator():
    try:
        yield "Hello"
    except ValueError:
        yield "Caught an exception!"

gen = simple_generator()
print(next(gen))  # Output: Hello
print(gen.throw(ValueError))  # Output: Caught an exception!

This allows you to handle exceptions within the generator gracefully.

Common Use Cases

Generators are incredibly versatile. Here are some common scenarios where they shine:

  • Reading large files: Instead of loading the entire file into memory, you can read it line by line.
  • Generating infinite sequences: Such as an infinite counter or Fibonacci sequence.
  • Data processing pipelines: Chaining generators to process data efficiently.

Example of reading a large file:

def read_large_file(file_path):
    with open(file_path, 'r') as file:
        for line in file:
            yield line.strip()

for line in read_large_file('large_data.txt'):
    process(line)  #假设 process 是处理每一行的函数

This approach ensures that only one line is in memory at a time.

Performance Benefits

Generators are memory efficient because they generate items one at a time and do not store the entire sequence in memory. This makes them ideal for working with large datasets or streams.

Compare a list comprehension vs. a generator expression:

import sys

list_comp = [x for x in range(10000)]
gen_exp = (x for x in range(10000))

print(sys.getsizeof(list_comp))  # 大小可能在 85176 字节左右
print(sys.getsizeof(gen_exp))    # 大小约为 112 字节

As you can see, the generator expression uses significantly less memory.

Generator Best Practices

When working with generators, keep the following in mind:

  • Use generators when dealing with large or infinite sequences.
  • Avoid using generators if you need random access to elements; lists are better for that.
  • Remember that generators are exhausted after iteration; you cannot iterate over them again without recreating.
  • Use yield from to delegate to sub-generators for cleaner code.
Operation List Comprehension Generator Expression
Memory Usage High Low
Execution Time Faster access Slower per item
Reusability Reusable Single-use
Use Case Small to medium data Large or streaming data

Advanced Example: Pipeline Processing

You can chain generators to create data processing pipelines. Each generator handles a specific step, and the data flows through them efficiently.

def read_lines(file_name):
    with open(file_name) as f:
        for line in f:
            yield line.strip()

def filter_lines(lines, pattern):
    for line in lines:
        if pattern in line:
            yield line

def uppercase_lines(lines):
    for line in lines:
        yield line.upper()

lines = read_lines('data.txt')
filtered = filter_lines(lines, 'error')
uppercased = uppercase_lines(filtered)

for line in uppercased:
    print(line)

This pipeline reads lines, filters for those containing 'error', and converts them to uppercase—all without loading the entire file into memory.

Generator vs. Iterator

It’s important to understand the difference between generators and iterators. All generators are iterators, but not all iterators are generators. Generators are a convenient way to create iterators without having to implement a class with iter() and next() methods.

Example of a custom iterator:

class Counter:
    def __init__(self, low, high):
        self.current = low
        self.high = high

    def __iter__(self):
        return self

    def __next__(self):
        if self.current > self.high:
            raise StopIteration
        else:
            self.current += 1
            return self.current - 1

for num in Counter(1, 5):
    print(num)

The generator version is much simpler:

def counter(low, high):
    current = low
    while current <= high:
        yield current
        current += 1

for num in counter(1, 5):
    print(num)

Both output:

1
2
3
4
5

But the generator is more concise and easier to read.

Debugging Generators

Debugging generators can be tricky because of their stateful nature. Use print statements or logging to track the flow, or leverage debugging tools like pdb.

Example with debugging:

def debug_generator(n):
    for i in range(n):
        print(f"Yielding {i}")
        yield i
        print(f"Resumed after yielding {i}")

gen = debug_generator(3)
next(gen)
next(gen)
next(gen)

Output:

Yielding 0
Resumed after yielding 0
Yielding 1
Resumed after yielding 1
Yielding 2
Resumed after yielding 2

This helps you understand the pause and resume behavior.

Conclusion

Generators are a fundamental tool in Python for creating efficient, memory-friendly iterators. Whether you're processing large files, building data pipelines, or generating sequences, generators offer a clean and powerful solution. Remember to use them when memory efficiency is crucial, and leverage their ability to handle data lazily. Happy coding!