Generators are an important concept in Python. They are functions that produce a sequence of values when iterated over.
They provide an iterable (just like lists or tuples) but with a key difference - generators don't store all of their values in memory at once. They produce each value on-the-fly, as you iterate over them, so they're great to use when working with large datasets. And crucial to know about as a Data Engineer.
A generator is defined by using yield instead of return in a function.
In short: generators are:
- lazy (items are produced one by one when requested)
- stateful between calls (you can pick up where you left off using next() )
- immutable (the sequence produced cannot be modified)
- single-use (you can iterate through it only once)
TL;DR When handling large, streaming or single-use collections, consider using generators. ๐ก
Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.