Skip to content

Custom Iterators

Page Maps

graph LR
  family["Python Programming"]
  program["Python Functional Programming"]
  section["Iterators Laziness Streaming Dataflow"]
  page["Custom Iterators"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone
flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Class-based iterators are not the default here; they are a justified escalation. Leave generators behind only when you need more explicit control over state, reuse, or cleanup than a simple generator can provide cleanly.

Start With the Generator Limit

Generators solve most streaming problems in this module. That is exactly why this lesson needs a careful opening. Learn when a custom iterator is warranted; do not treat classes as automatically more advanced and therefore better.

  • If a stream needs explicit cleanup, restartable iteration, or more structured state handling, a class may be the clearer design.
  • If the same object is both iterable and iterator by accident, reuse and consumption semantics become easy to blur.
  • If the lifecycle is hidden, reviewers cannot tell when resources are released or whether iteration can safely restart.

Keep This Question In View

Core question:
How do you design custom iterator classes that implement __iter__ and __next__ for complex stateful logic, ensuring purity, laziness, and equivalence while enabling reuse beyond simple generators?

This lesson introduces custom iterators as an explicit lifecycle pattern:

  • separate the reusable iterable from the single-pass cursor
  • keep state transitions and cleanup obligations visible in the class design
  • preserve laziness while giving you a clearer story about restartability and resource control

The running and cross-domain examples matter because custom iterators are a practical response to a real lifecycle need, not ceremony for its own sake.

Use this when you have hit the limits of generators for reusable, stateful, or resource-aware streams.

Outcome: 1. Spot generator limits like no reuse. 2. Build class iterator in < 15 lines. 3. Prove iter laws with Hypothesis.

Laws (frozen, used across this core): - E1 — Equivalence: iter(class_factory(S)) == gen_equiv(S). - P1 — Purity: No globals; explicit state. - R1 — Reusability: For any iterable X, iter(X) is not iter(X) and both iterators produce identical sequences. - I1a — Iterator parity: iter(it) is it and after exhaustion, next(it) raises immediately. - I1b — Iterable parity: iter(X) is not iter(X) and list(iter(X)) == list(iter(X)). - CL1 — Cleanup: Resources released on .close() or __exit__. - DTR — Determinism: Equal init/state → equal outputs. - FR — Freshness: Factory calls independent.

Iterator vs Iterable in Python (memorise): - Iterable: Has __iter__ returning an iterator (may be self or fresh cursor). Supports for x in obj: and iter(obj). - Iterator: Has __next__ (raise StopIteration at end) and __iter__ returning self. Single-pass; exhausted after consumption.

Factories are Iterable; cursors are Iterator.


1. Conceptual Foundation

1.1 The One-Sentence Rule

Use separate Iterable factories and Iterator cursors for stateful, reusable iterators with explicit cleanup, when generators lack control.

1.2 Custom Iter in One Precise Sentence

Iterable factories return fresh Iterators; iterators implement __next__ logic, __iter__ return self.

In this series, enables resources; preserves laziness.

1.3 Why This Matters Now

Up to this point, generators have been enough for almost every lesson. That is a feature, not a problem. This page matters because it marks the boundary where a generator stops being the clearest representation. Recognize that boundary so you can choose a custom iterator intentionally rather than out of habit or novelty.

1.4 Custom Iter in 5 Lines

The next snippet matters because it separates "object you can iterate over again" from "cursor currently walking the data."

class MyIterable:
    def __init__(self, data):
        self.data = data
    def __iter__(self):
        return MyIter(self.data)

class MyIter:
    def __init__(self, data):
        self.data = data
        self.i = 0
    def __iter__(self): return self
    def __next__(self):
        if self.i >= len(self.data): raise StopIteration
        val = self.data[self.i]; self.i += 1; return val

Reusable.

1.5 Minimal Iter Harness (Extends Core 8)

Build on Core 8; add class patterns:

from typing import Iterator, Iterable, TypeVar
T = TypeVar("T")

class BaseIterable(Iterable[T]):
    def __iter__(self) -> Iterator[T]:
        raise NotImplementedError

class BaseIter(Iterator[T]):
    def __iter__(self) -> 'BaseIter[T]':
        return self
    def __next__(self) -> T:
        raise NotImplementedError
    def close(self):
        pass

Use as base; e.g., class MyIterable(BaseIterable[T]): ...


2. Mental Model: Generator vs Class Iter

2.1 One Picture

Generators (Simple)                     Class Iters (Powerful)
+-----------------------+               +------------------------------+
| one-shot, no reuse    |               | stateful, reusable           |
|        ↓              |               |        ↓                     |
| no cleanup control    |               | .close() resources, errors   |
| lightweight           |               | testable, composable         |
+-----------------------+               +------------------------------+
   ↑ Limited / Stateless                   ↑ Flexible / Stateful

2.2 Behavioral Contract

Aspect Generators Class Iters
Reuse No (exhausted) Yes (reset state)
Cleanup Auto Explicit .close()
State Suspended Explicit attrs
Equivalence Simple Via properties

Note on Generator Choice: Simple logic; else class.

When Not to Class: No state; use gen.

Known Pitfalls: - Forgotten iter return self. - State mutation leaks.

Forbidden Patterns: - For iterators: iter not returning self. - For iterables: iter returning self (violates R1/I1b). - Enforce with type checks.

Building Blocks Sidebar: - For iterators: iter return self. - For iterables: iter return fresh cursor. - next logic/raise. - .close() cleanup.

Resource Semantics: Classes handle close in .close().

Error Model: Raise in next; cleanup always.

Purity Note: Sources (files/APIs/logs) are effectful; purity claims apply to transforms. Cleanup is explicit via .close()/context managers.


3. Cross-Domain Examples: Proving Scalability

Production-grade examples using the harness. Each stateful, clean.

3.1 Example 1: Stateful CSV Reader (Class Iter)

from __future__ import annotations
from typing import Iterator, Iterable, Dict
import csv
from io import TextIOBase

class CsvRows(Iterable[Dict[str, str]]):
    """DictReader: first row becomes header keys."""
    def __init__(self, path: str, *, dialect: str = "excel"):
        self._path = path
        self._dialect = dialect

    def __iter__(self) -> Iterator[Dict[str, str]]:
        return _CsvRowsIter(self._path, self._dialect)

class _CsvRowsIter(Iterator[Dict[str, str]]):
    def __init__(self, path: str, dialect: str):
        self._path = path
        self._dialect = dialect
        self._f: TextIOBase | None = None
        self._rdr: csv.DictReader | None = None

    def __iter__(self) -> "_CsvRowsIter":
        return self

    def __enter__(self):
        self._open()
        return self

    def __exit__(self, exc_type, exc, tb):
        self.close()

    def _open(self) -> None:
        if self._f is None:
            self._f = open(self._path, newline="")
            self._rdr = csv.DictReader(self._f, dialect=self._dialect)

    def __next__(self) -> Dict[str, str]:
        if self._rdr is None:
            self._open()
        try:
            return next(self._rdr)  # type: ignore[arg-type]
        except StopIteration:
            self.close()
            raise

    def close(self) -> None:
        if self._f is not None:
            self._f.close()
            self._f = None
            self._rdr = None

Why it's good: Cleanup on early stop/close; lazy open in next means plain iteration works.

Usage with guaranteed cleanup:

# Plain iteration (closes on natural exhaustion)
for row in CsvRows("data.csv"):
    process(row)

# Early-stop guaranteed cleanup
with iter(CsvRows("data.csv")) as rows:
    for row in rows:
        process(row)
        if done: break

3.2 Example 2: Stateful Log Follower (Class Iter)

import io, os, time
from typing import Iterator

class LogFollower(Iterable[str]):
    def __init__(self, path: str, poll: float = 0.2):
        self.path = path
        self.poll = poll

    def __iter__(self) -> Iterator[str]:
        return _LogFollowerIter(self.path, self.poll)

class _LogFollowerIter(Iterator[str]):
    def __init__(self, path: str, poll: float):
        self.path = path
        self.poll = poll
        self._f: io.TextIOBase | None = None
        self._ino: int | None = None

    def __iter__(self) -> "_LogFollowerIter":
        return self

    def __enter__(self):
        self._open()
        return self

    def __exit__(self, exc_type, exc, tb):
        self.close()

    def _open(self):
        self._f = open(self.path, "r", encoding="utf8", errors="replace")
        self._f.seek(0, io.SEEK_END)
        self._ino = os.fstat(self._f.fileno()).st_ino

    def __next__(self) -> str:
        if self._f is None:
            self._open()
        while True:
            line = self._f.readline()
            if line:
                return line.rstrip("\n")
            time.sleep(self.poll)
            try:
                if os.stat(self.path).st_ino != self._ino:
                    self._f.close()
                    self._open()
            except FileNotFoundError:
                pass

    def close(self):
        if self._f is not None:
            self._f.close()
            self._f = None

Why it's good: Stateful rotation/cleanup.

3.3 Example 3: Stateful API Pager

from typing import Iterator, Callable, Any, Optional

class ApiPager(Iterable[dict[str, Any]]):
    def __init__(self, fetch_page: Callable[[Optional[str]], dict[str, Any]]):
        self._fetch_page = fetch_page

    def __iter__(self) -> Iterator[dict[str, Any]]:
        return _ApiPagerIter(self._fetch_page)

class _ApiPagerIter(Iterator[dict[str, Any]]):
    def __init__(self, fetch_page: Callable[[Optional[str]], dict[str, Any]]):
        self._fetch_page = fetch_page
        self._token: Optional[str] = None
        self._current_items: list[dict[str, Any]] = []
        self._idx: int = 0
        self._done: bool = False

    def __iter__(self) -> "_ApiPagerIter":
        return self

    def __next__(self) -> dict[str, Any]:
        while self._idx >= len(self._current_items):
            if self._done:
                raise StopIteration
            page = self._fetch_page(self._token)
            self._current_items = page.get("items", [])
            self._idx = 0
            self._token = page.get("next")
            if not self._token:
                self._done = True
            if not self._current_items and self._done:
                raise StopIteration

        item = self._current_items[self._idx]
        self._idx += 1
        return item

Why it's good: Stateful token + intra-page cursor; no item loss or duplicate pages.

3.4 Example 4: Stateful Telemetry Window

from collections import deque

class RollingAvgSource(Iterable[dict]):
    def __init__(self, src: Iterable[dict], w: int):
        self._src = src
        self._w = w

    def __iter__(self):
        return RollingAvgIter(self._src, self._w)

class RollingAvgIter(Iterator[dict]):
    def __init__(self, src: Iterable[dict], w: int):
        self._src = iter(src)
        self._w = w
        self._buf = deque(maxlen=w)

    def __iter__(self):
        return self

    def __next__(self) -> dict:
        if len(self._buf) < self._w:
            while len(self._buf) < self._w:
                self._buf.append(next(self._src))
        else:
            self._buf.append(next(self._src))
        avg = sum(d["value"] for d in self._buf) / self._w
        return {"avg": avg, "end_ts": self._buf[-1]["ts"]}

Why it's good: Stateful buffer; fresh on each iter(RollingAvgSource(...)).

3.5 Example 5: Stateful FS Walker

import os

class FsWalker(Iterable[str]):
    def __init__(self, root: str):
        self.root = root

    def __iter__(self):
        return _FsWalkerIter(self.root)

class _FsWalkerIter(Iterator[str]):
    def __init__(self, root: str):
        self.walk = os.walk(root)
        self.dirpath = None
        self.files = []

    def __iter__(self):
        return self

    def __next__(self) -> str:
        while not self.files:
            self.dirpath, _, self.files = next(self.walk)
        fn = self.files.pop(0)
        return os.path.join(self.dirpath, fn)

Why it's good: Stateful walk; fresh on each iter(FsWalker(...)).

3.6 Example 6: Stateful N-Gram

class NGramSource(Iterable[tuple[str, ...]]):
    def __init__(self, toks_iterables: Iterable[list[str]], n: int):
        self._toks_iterables = toks_iterables
        self._n = n

    def __iter__(self):
        return NGramIter(self._toks_iterables, self._n)

class NGramIter(Iterator[tuple[str, ...]]):
    def __init__(self, toks_iterables: Iterable[list[str]], n: int):
        self._outer = iter(toks_iterables)
        self._n = n
        self._buf: list[str] = []
        self._i = 0  # sliding index within buffer

    def __iter__(self):
        return self

    def __next__(self) -> tuple[str, ...]:
        while self._i + self._n > len(self._buf):
            self._buf.extend(next(self._outer))
        gram = tuple(self._buf[self._i:self._i + self._n])
        self._i += 1
        if self._i > 1024:
            self._buf = self._buf[self._i:]
            self._i = 0
        return gram

Why it's good: Stateful overlap; fresh on each iter(NGramSource(...)).

3.7 Running Project: Stateful RAG Chunker

Extend RAG with class chunker:

class RagChunks(Iterable[ChunkWithoutEmbedding]):
    def __init__(self, docs: Iterable[RawDoc], env: RagEnv, max_chunks: int):
        self._docs = docs
        self._env = env
        self._max = max_chunks

    def __iter__(self):
        return RagChunker(self._docs, self._env, self._max)

class RagChunker(Iterator[ChunkWithoutEmbedding]):
    def __init__(self, docs: Iterable[RawDoc], env: RagEnv, max_chunks: int):
        self._docs = iter(docs)
        self._env = env
        self._max = max_chunks
        self._emitted = 0
        self._cur: Iterator[ChunkWithoutEmbedding] | None = None

    def __iter__(self):
        return self

    def __next__(self) -> ChunkWithoutEmbedding:
        if self._emitted >= self._max:
            raise StopIteration
        while True:
            if self._cur is None:
                d = next(self._docs)                     # may raise StopIteration
                self._cur = gen_overlapping_chunks(d.doc_id, d.abstract, k=self._env.chunk_size, o=self._env.overlap, tail_policy=self._env.tail_policy)
            try:
                ch = next(self._cur)
                self._emitted += 1
                return ch
            except StopIteration:
                self._cur = None

Wins: Stateful count/cleanup; fresh on each iter(RagChunks(...)).


What comes next

The main lesson should leave you able to separate an iterable factory from its cursor and build a resource-aware iterator when you need one. The next step is to review the lifecycle rules and decide whether the class form is really better than a generator here.

Continue with Iterator Lifecycle and Cleanup before you move into Streaming Observability.