What Is a Generator?
A generator is a special type of function that returns an iterator. Instead of computing all values at once and storing them in memory, generators produce values one at a time, on demand.
def count_up_to(n):
i = 1
while i <= n:
yield i
i += 1
for num in count_up_to(5):
print(num)
# 1, 2, 3, 4, 5
The yield keyword is what makes this a generator function. Each time yield is encountered, the function pauses and produces a value. When the next value is requested, execution resumes right where it left off.
Generators vs Lists
Consider generating the first million square numbers:
# List approach - stores all values in memory
squares_list = [x**2 for x in range(1_000_000)]
# Generator approach - produces values on demand
squares_gen = (x**2 for x in range(1_000_000))
The list uses roughly 8 MB of memory. The generator uses almost nothing, regardless of how many values it can produce.
The yield Keyword
When Python encounters yield, it:
- Returns the yielded value to the caller
- Suspends the function’s state (local variables, instruction pointer)
- Resumes from exactly that point on the next
next()call
def simple_generator():
print("First")
yield 1
print("Second")
yield 2
print("Third")
yield 3
gen = simple_generator()
print(next(gen)) # Prints "First", returns 1
print(next(gen)) # Prints "Second", returns 2
print(next(gen)) # Prints "Third", returns 3
Generator Expressions
Just like list comprehensions, Python has generator expressions:
# List comprehension (eager)
evens = [x for x in range(100) if x % 2 == 0]
# Generator expression (lazy)
evens = (x for x in range(100) if x % 2 == 0)
Use generator expressions when you only need to iterate once, especially over large datasets.
Practical Example: Reading Large Files
Generators shine when processing data that does not fit in memory:
def read_large_file(file_path):
with open(file_path, 'r') as f:
for line in f:
yield line.strip()
# Process a multi-gigabyte log file line by line
for line in read_large_file('huge_log.txt'):
if 'ERROR' in line:
print(line)
This processes the file one line at a time, never loading the entire file into memory.
Chaining Generators
You can compose generators to build data processing pipelines:
def read_lines(path):
with open(path) as f:
for line in f:
yield line.strip()
def filter_errors(lines):
for line in lines:
if 'ERROR' in line:
yield line
def extract_timestamps(lines):
for line in lines:
yield line.split(' ')[0]
# Pipeline
lines = read_lines('app.log')
errors = filter_errors(lines)
timestamps = extract_timestamps(errors)
for ts in timestamps:
print(ts)
Each generator processes one item at a time. The entire pipeline uses constant memory regardless of file size.
Key Takeaways
- Generators produce values lazily using
yield - They are memory-efficient for large datasets
- Generator expressions use
()instead of[] - Generators can be chained into processing pipelines
- Use generators when you iterate once over large or infinite sequences