Expression-Oriented Python¶

Concept Position¶

flowchart TD
  family["Python Programming"] --> program["Python Functional Programming"]
  program --> module["Module 02: Data-First APIs and Expression Style"]
  module --> concept["Expression-Oriented Python"]
  concept --> capstone["Capstone pressure point"]

flowchart TD
  problem["Start with the design or failure question"] --> example["Study the worked example and trade-offs"]
  example --> boundary["Name the boundary this page is trying to protect"]
  boundary --> proof["Carry that question into code review or the capstone"]

Read the first diagram as a placement map: this page is one concept inside its parent module, not a detached essay, and the capstone is the pressure test for whether the idea holds. Read the second diagram as the working rhythm for the page: name the problem, study the example, identify the boundary, then carry one review question forward.

Progression Note¶

By the end of Module 2, you'll master first-class functions for configurability, expression-oriented code, and debugging taps. This prepares for lazy iteration in Module 3. See the series progression map in the repo root for full details.

Here's a snippet from the progression map:

Module	Focus	Key Outcomes
1: Foundational FP Concepts	Purity, contracts, refactoring	Spot impurities, write pure functions, prove equivalence with Hypothesis
2: First-Class Functions & Expressive Python	Closures, partials, composable configurators	Configure pure pipelines without globals
3: Lazy Iteration & Generators	Streaming/lazy pipelines	Efficient data processing without materializing everything

Core question:
How do you replace statement-heavy imperative code (loops + flags + breaks) with expressions, comprehensions, and data-driven conditionals—so control flow becomes explicit, composable, and easy to reason about?

This core introduces the expression-oriented mindset in Python:

Treat core logic as value-producing expressions, not sequences of mutations.
Default to comprehensions, conditional expressions, and built-ins (any, all, next) for control flow.
Eliminate mutable control flags from core logic—keep them only at trivial edges, if at all.

We continue the running project from m02-rag.md, extending the FuncPipe RAG Builder:

Baseline: a composition of the pure stages (clean → chunk → embed → dedup).
Module 2 (Core 1): make_rag_fn(...) – closure-based configurators.
This core: Replace imperative loops in the RAG core with expression-oriented code that is easier to configure, test, and prove equivalent to the baseline.

Audience: Developers who understand purity and configurators (Core 1) but still write loops like:

found = False
for x in xs:
    if pred(x):
        found = True
        break

Outcome:

Spot control flags (found, valid, done) and explain why they obscure logic.
Refactor a 10–20 line loop into comprehensions / any / next while preserving semantics.
Write a Hypothesis property that proves equivalence to the baseline and exposes a real flag-based bug.

Runnability Note (Module 01 Snapshot vs Module 02 End-State)¶

Some “before” snippets in this core are hypothetical pre-refactor examples used for contrast. They are labeled accordingly and are not meant to exactly match a real snapshot. We refactor these shapes into the real Module 02 API as the module progresses.

For a real, runnable Module 01 codebase, refresh the generated history route first:

make PROGRAM=python-programming/python-functional-programming history-refresh
Module 01 path: capstone/_history/worktrees/module-01/
Import path for Module 01: capstone/_history/worktrees/module-01/src/

1. Conceptual Foundation¶

1.1 Expression-Oriented Python in One Precise Sentence¶

Expression-oriented programming treats control flow as compositions of value-producing expressions instead of stepwise mutation—so code reads as “data -> data” rather than “state -> state”.

1.2 The One-Sentence Rule¶

In core logic, do not use mutable flags (found, valid, done) or manual break/continue for control; use comprehensions, conditional expressions, and built-ins that return values—flags and break may be acceptable inside encapsulated low-level helpers with pure signatures.

1.3 Why This Matters Now¶

Core 1 gave you pure functions and closure-based configurators:

make_rag_fn(...) -> Callable[[list[RawDoc]], tuple[list[Chunk], Observations]] is pure and deterministic.
But the implementation of the RAG core can still be imperative:
Loops with flags, early breaks, scattered if blocks.
Harder to reason about, harder to transform, and easier to subtly break when adding new behaviors.

Expression-oriented code:

Turns “do this, then maybe that” into “compute this value, then transform it”.
Makes pipelines equational: each step is an expression you can substitute and test in isolation.
Aligns perfectly with Core 1’s closure-based configurators: you configure expressions, not control-flow spaghetti.

Core 1 configures what RAG function we call (make_rag_fn); Core 2 refactors how that function is implemented internally (full_rag_api expressed as comprehensions instead of flags).

1.4 Expressions as Values in 5 Lines¶

We start with a simple, RAG-flavored predicate table:

from collections.abc import Callable
from funcpipe_rag import RawDoc


def has_long_abstract(d: RawDoc) -> bool:
    return len(d.abstract) >= 100


def is_cs_category(d: RawDoc) -> bool:
    return d.categories.startswith("cs.")


DocPred = Callable[[RawDoc], bool]

predicates: dict[str, DocPred] = {
    "long_abstract": has_long_abstract,
    "cs_only": is_cs_category,
}


def filter_docs(key: str, docs: list[RawDoc]) -> list[RawDoc]:
    return [d for d in docs if predicates[key](d)]

The key point:

filter_docs is a single expression ([...]) mapping docs to docs.
Control flow (“if this doc satisfies predicate P, keep it”) is encoded as data: predicates[key].

No flags, no break; everything is composable and easy to test.

2. Mental Model: Imperative Flags vs Expressions¶

2.1 One Picture¶

Imperative Flags (Mutable)              Expression-Oriented (Pure)
+-----------------------+               +------------------------------+
| found = False         |               |   found = any(pred(x)        |
| for x in xs:          |               |               for x in xs)   |
|     if pred(x):       |               |                              |
|         found = True  |               |   # Single expression        |
|         break         |               |   # No flags, no break       |
+-----------------------+               +------------------------------+
   ↑ Scattered control                         ↑ Control is data
   ↑ Subtle state coupling                     ↑ Easy to compose / test

2.2 Contract Table¶

Aspect	Imperative Flags	Expression-Oriented
Dependencies	Hidden in loop structure	Explicit in predicates and expressions
Control Flow	Flags + `break`/`continue`	Comprehensions, `any`/`all`/`next`, ternaries
Reasoning	Global: “what happens to `found`?”	Local: “what does this expression compute?”
Refactoring	Easy to introduce non-local bugs	Equational: refactor expression ↔ expression
Testing	Need to inspect loop behavior	Test expressions as pure functions

# Imperative: flag + break to get first matching doc
first_long = None
for d in docs:
    if has_long_abstract(d):
        first_long = d
        break

# Expression-oriented: next() with default
first_long = next(
    (d for d in docs if has_long_abstract(d)),
    None,  # default if no doc matches
)

While comprehensions promote expression-oriented code, prioritize readability: If a comprehension becomes nested or complex (e.g., 3+ layers), refactor to named helper functions or consider a simple loop inside a trivial pure wrapper. Purity matters, but so does maintainability.

3. Running Project: FuncPipe RAG Builder¶

We continue the FuncPipe RAG Builder from m02-rag.md.

Baseline: a pure stages composition (clean → chunk → embed → dedup).
Module 2 Core 1: make_rag_fn(...) – closure-based configurators.
This core: We refactor the internal implementation of the RAG API from imperative loops to expression-based code while preserving equivalence to the baseline.

3.1 Types (Canonical, Used Throughout)¶

We rely on the types defined in capstone/src/funcpipe_rag/rag_types.py and capstone/src/funcpipe_rag/api/types.py:

from funcpipe_rag import DocRule, Observations, RagTaps
from funcpipe_rag import RawDoc, CleanDoc, Chunk, RagEnv

These are pure data containers; expression orientation will sit on top of them.

4. Imperative Start: Loops and Flags¶

We begin with a hypothetical pre-refactor implementation of the extended RAG pipeline. It’s semantically correct, but filled with flags and stepwise loops, and it is not intended to be run as-is in the end-of-Module-02 checkout.

# core2_start.py (hypothetical pre-refactor; illustration only)
from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from funcpipe_rag import DocRule, Observations, RagTaps
from funcpipe_rag import any_doc
from funcpipe_rag import clean_doc  # baseline stage
from funcpipe_rag import embed_chunk, structural_dedup_chunks


def imperative_full_rag_api(
        docs: list[RawDoc],
        env: RagEnv,
        cleaner: Callable[[RawDoc], CleanDoc],
        *,
        keep: DocRule | None = None,
        taps: RagTaps | None = None,
) -> tuple[list[Chunk], Observations]:
    rule = keep if keep is not None else any_doc

    # 1) Filter docs using per-doc flag
    kept_docs: list[RawDoc] = []
    for d in docs:
        is_kept = rule(d)  # Flag; local, but unnecessary
        if is_kept:
            kept_docs.append(d)
    if taps and taps.docs:
        taps.docs(tuple(kept_docs))

    # 2) Clean docs using explicit accumulation
    cleaned: list[CleanDoc] = []
    for d in kept_docs:
        cd = cleaner(d)
        cleaned.append(cd)
    if taps and taps.cleaned:
        taps.cleaned(tuple(cleaned))

    # 3) Chunk each cleaned doc using index + while loop
    chunk_we: list[ChunkWithoutEmbedding] = []
    for cd in cleaned:
        text = cd.abstract
        i = 0
        while i < len(text):
            s = text[i:i + env.chunk_size]
            if s:
                chunk_we.append(
                    ChunkWithoutEmbedding(cd.doc_id, s, i, i + len(s))
                )
            i += env.chunk_size

    # 4) Embed chunks
    embedded: list[Chunk] = []
    for c in chunk_we:
        embedded.append(embed_chunk(c))

    # 5) Deduplicate structurally (baseline stage helper)
    chunks = structural_dedup_chunks(embedded)
    if taps and taps.chunks:
        taps.chunks(tuple(chunks))

    obs = Observations(
        total_docs=len(docs),
        total_chunks=len(chunks),
        kept_docs=len(kept_docs),
        cleaned_docs=len(cleaned),
        sample_doc_ids=tuple(d.doc_id for d in kept_docs[:5]),
        sample_chunk_starts=tuple(c.start for c in chunks[:5]),
    )
    return chunks, obs

Key points:

This function is pure and deterministic.
But control flow is encoded as:
Per-doc flags (is_kept),
Manual accumulation loops,
Explicit index management (i + while).

It works, but it doesn’t read as “data -> data” so much as “do X, then Y, then Z”.

5. Refactor to Expressions: Comprehensions & Conditionals¶

We now introduce a small helper and an expression-oriented RAG core.

5.1 Side-Effect Taps as an Expression Primitive¶

We define _tap as the only side-effect primitive allowed in this core:

from typing import TypeVar, Callable

T = TypeVar("T")

def _tap(xs: list[T], h: Callable[[tuple[T, ...]], None] | None) -> list[T]:
    """
    Observational tap: if h is provided, call h(tuple(xs)) for side effects,
    then return xs unchanged.

    Contract: For all xs and h, the *return value* of _tap(xs, h) equals xs.
    All value-level behavior of the pipeline is unchanged; only side effects differ.
    """
    if h:
        h(tuple(xs))
    return xs

This preserves the value semantics of the pipeline while allowing optional metrics/logging at the edges.

5.2 Expression-Oriented RAG Core¶

We now rewrite the RAG core in an expression style. This is an illustration-only refactor; the runnable end-of-Module-02 implementation lives in capstone/src/funcpipe_rag/rag/rag_api.py (full_rag_api_docs / full_rag_api) with the frozen config and dependency wiring in capstone/src/funcpipe_rag/rag/config.py.

# core2_refactor_demo.py (illustration only; not the canonical Module-02 API)
from collections.abc import Callable

from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from funcpipe_rag import DocRule, Observations, RagTaps
from funcpipe_rag import any_doc
from funcpipe_rag import embed_chunk, structural_dedup_chunks


def toy_gen_chunk_doc(cd: CleanDoc, env: RagEnv) -> list[ChunkWithoutEmbedding]:
    """
    Pure helper: chunk a cleaned document into fixed-size pieces.
    """
    text = cd.abstract
    return [
        ChunkWithoutEmbedding(cd.doc_id, chunk_text, start, start + len(chunk_text))
        for start in range(0, len(text), env.chunk_size)
        if (chunk_text := text[start:start + env.chunk_size])
    ]


def toy_full_rag_api(
        docs: list[RawDoc],
        env: RagEnv,
        cleaner: Callable[[RawDoc], CleanDoc],
        *,
        keep: DocRule | None = None,
        taps: RagTaps | None = None,
) -> tuple[list[Chunk], Observations]:
    rule = keep if keep is not None else any_doc  # conditional expression

    kept_docs = _tap(
        [d for d in docs if rule(d)],  # filter
        taps.docs if taps else None,
    )

    cleaned = _tap(
        [cleaner(d) for d in kept_docs],  # map
        taps.cleaned if taps else None,
    )

    chunk_we = [
        c
        for cd in cleaned
        for c in toy_gen_chunk_doc(cd, env)  # flatMap
    ]

    embedded = [embed_chunk(c) for c in chunk_we]
    chunks = _tap(
        structural_dedup_chunks(embedded),
        taps.chunks if taps else None,
    )

    obs = Observations(
        total_docs=len(docs),
        total_chunks=len(chunks),
        kept_docs=len(kept_docs),
        cleaned_docs=len(cleaned),
        sample_doc_ids=tuple(d.doc_id for d in kept_docs[:5]),
        sample_chunk_starts=tuple(c.start for c in chunks[:5]),
    )
    return chunks, obs

Properties:

No mutable flags (is_kept, found_chunk, done).
Control flow is now encoded as expressions:
Filtering: [d for d in docs if rule(d)]
Mapping: [cleaner(d) for d in kept_docs]
Chunk flattening: for cd in cleaned for c in gen_chunk_doc(cd, env)
_tap is the only place where side effects may occur, and it preserves the values.

This is now a direct “data -> data” description of the pipeline.

5.3 Expression Partial (Core 1 Tie-In)¶

from functools import partial
from funcpipe_rag import CleanConfig, make_rag_fn, any_doc

has_long_abstract = lambda d: len(d.abstract) >= 100
has_valid_doc = lambda d: any_doc(d) and has_long_abstract(d)  # Logical and as expression
# In the end-of-Module-02 codebase, `make_rag_fn` captures frozen config.
rag_fn = make_rag_fn(chunk_size=512, clean_cfg=CleanConfig())
# Expression-oriented filtering still composes cleanly:
filtered_docs = [d for d in docs if has_valid_doc(d)]
chunks, obs = rag_fn(filtered_docs)

Wins: Data-driven filtering without flags; composes with Core 1. make_rag_fn is canonical configurator wrapping this expression-based pipeline.

What comes next¶

The main expression lesson should leave you able to rewrite loops and flags into a clearer dataflow. The next step is to review when that rewrite is genuinely better and how to prove it preserved behavior.

Continue with Expression Review and Trade-Offs before you move into Introducing Laziness.