Skip to content

Debugging Compositions

Page Maps

graph LR
  family["Python Programming"]
  program["Python Functional Programming"]
  section["Data First Apis Expression Style"]
  page["Debugging Compositions"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone
flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Debugging does not require abandoning the design principles from the rest of the module. The goal is to observe the pipeline honestly, not to punch ad-hoc holes in it whenever something feels mysterious.

Start With the Debugging Trap

The common trap is easy to recognize: once a composed pipeline feels opaque, it is tempting to insert prints, eager list(...) calls, and mutable debug switches. Those moves reveal something, but they also distort the code you are trying to understand.

  • If debugging changes evaluation order or forces materialization, it is changing more than visibility.
  • If log statements are mixed into core transforms, the lesson boundary between pure logic and effects is disappearing.
  • If reviewers cannot tell which debug behavior is temporary and which is part of the design, the instrumentation is too ad hoc.

Keep This Question In View

Core question:
How do you debug pure, composed FP pipelines without scattering prints or breaking laziness—so "it works in my head but not in prod" becomes structured traces, reproducible probes, and self-documenting names that turn black-box flows into auditable glass?

This lesson introduces debugging as explicit pipeline design:

  • name stages so the pipeline is readable even before a bug appears
  • use explicit trace or probe stages when observation is needed
  • keep debugging support composable so it can be enabled, disabled, and reviewed without rewriting the core

The running project keeps the lesson grounded: debugging support should help explain chunk flow and decision points without flattening the lazy pipeline into a debugging script.

Use this when you have DSL-driven pipelines but still debug with prints, breakpoints, or forced eager evaluation. Outcome:
1. Spot debug smells (prints in core, mutable flags, eager list()) and explain their impact on purity.
2. Refactor an opaque pipeline to include named functions, tee traces, and probe assertions.
3. Write Hypothesis properties with verbose tracing to debug failures, including a shrinking example.


1. Conceptual Foundation

1.1 Debugging FP Code in One Precise Sentence

Debugging FP code treats naming, probing (tee/probe stages), and tracing (structured logs + Hypothesis) as explicit, composable operations—so pipelines remain lazy and reproducible while revealing every intermediate step.

1.2 The One-Sentence Rule

Never smuggle ad-hoc prints or breakpoints into core transforms; isolate debug effects in explicit tee/probe stages bound by boundary config—debug like sealed data.

1.3 Why This Matters Now

Once a pipeline becomes more declarative, a new fear appears: "it is cleaner, but how do I see what is happening?" This page answers that fear directly. Do not go backward to print-driven debugging. Add observation points that respect the pipeline structure you have already built.

1.4 Debugging as Values in 5 Lines

The next snippet matters because the tracing behavior is packaged as a reusable stage instead of being threaded through every transformation.

from collections.abc import Callable, Iterable, Iterator
from typing import TypeVar
import logging
import json

T = TypeVar("T")
log = logging.getLogger(__name__)

def tee(stage: str) -> Callable[[Iterable[T]], Iterator[T]]:
    def tracer(xs: Iterable[T]) -> Iterator[T]:
        for x in xs:
            log.info(json.dumps({"stage": stage, "value": repr(x)[:100]}))  # Structured, truncated
            yield x
    return tracer

Tee stages, bound via partial if needed, allow storage in dicts, composition with M02C01, and lazy tracing—explicit and configurable.

Note: Use structured JSON logs in production; effects sealed in tee. In production, keep trace_* off by default and only enable for targeted runs; tee is intentionally heavyweight and should not be permanently hot in tight loops. Adapt the logging shape (e.g., repr(x)[:100]) for your own domains since repr can be large or noisy.


2. Mental Model: Print Hell vs Explicit Stages

2.1 One Picture

Print Hell (Impure)                     Explicit Stages (Sealed)
+-----------------------+               +------------------------------+
| def rag(gen):         |               | flow(                        |
|     for x in gen:     |               |   producer,                  |
|         print(x)      |               |   tee("input"),              |
|         yield process(x)|             |   named("process", process), |
+-----------------------+               |   probe("valid_chunk", assert_is_chunk),|
   ↑ Breaks Purity/Laziness                |   tee("output")              |
                                           | )()                          |
                                           +------------------------------+
                                             ↑ Lazy, Auditable

2.2 Contract Table

Aspect Print Hell Explicit Stages
Purity Leaky effects Sealed in tee
Laziness Often eager Yields through
Readability Scattered prints Named + probed stages
Reproducibility Manual runs Hypothesis verbose
Configurability Global flags Boundary config + identity
Mutable Defaults in Partials Breaks Determinism Use frozen dataclasses or immutable types for configs

Note on Print Choice: Use prints only in trivial scripts; always prefer explicit stages for pipelines.


3. Running Project: FuncPipe RAG Builder

We extend the FuncPipe RAG Builder from m02-rag.md:
- Dataset: 10k arXiv CS abstracts (arxiv_cs_abstracts_10k.csv).
- Goal: Add debugging to combinator pipelines without breaking laziness.
- Start: Opaque version (core9_start.py).
- End: Debuggable pipeline with named stages, tees, and probes.

3.1 Types (Canonical, Used Throughout)

Extend with debug config (full, runnable with imports):

from funcpipe_rag import CleanConfig, DebugConfig, Observations, RagConfig, RagEnv, RagTaps

3.2 Flow Contract (Recall from M02C07)

Recall the flow contract from M02C07: Producer starts; transformers chain lazily. Types approximate the structure; in practice, use mypy for checks.

from typing import Any, Callable, Iterable, Protocol, TypeVar

T = TypeVar("T")
U = TypeVar("U")

class Producer(Protocol[T]):
    def __call__(self) -> Iterable[T]: ...

class Stage(Protocol[T, U]):
    def __call__(self, xs: Iterable[T]) -> Iterable[U]: ...

def flow(prod: Producer[T], *stages: Stage[Any, Any]) -> Callable[[], Iterable[Any]]:
    def run() -> Iterable[Any]:
        data: Iterable[Any] = prod()
        for s in stages:
            data = s(data)
        return data
    return run

3.3 Opaque Start (Anti-Pattern)

from funcpipe_rag import (
    DebugConfig,
    Observations,
    RagConfig,
    RagEnv,
    eval_pred,
    ffilter,
    flatmap,
    flow,
    fmap,
    gen_chunk_doc,
    get_deps,
    structural_dedup_chunks,
)


def opaque_run_core_on_docs(docs, chunk_size):
    config = RagConfig(env=RagEnv(chunk_size), debug=DebugConfig())
    deps = get_deps(config)
    pipeline = flow(
        lambda: docs,
        ffilter(lambda d: eval_pred(d, config.keep.keep_pred)),
        fmap(deps.cleaner),
        flatmap(lambda cd: gen_chunk_doc(cd, config.env)),
        fmap(deps.embedder),
    )
    chunks = structural_dedup_chunks(pipeline())
    obs = Observations(total_docs=len(docs), total_chunks=len(chunks))
    return chunks, obs

Smells:
- Anonymous lambdas (e.g., lambda cd: gen_chunk_doc).
- No traces or probes.
- Hard to audit decisions.


4. Refactor to Debuggable: Naming + Tee + Probe

4.1 Debug Primitives (Explicit Stages)

from collections.abc import Callable, Iterable, Iterator
from typing import TypeVar
import logging
import json

T = TypeVar("T")
log = logging.getLogger(__name__)

def tee(stage: str) -> Callable[[Iterable[T]], Iterator[T]]:
    def tracer(xs: Iterable[T]) -> Iterator[T]:
        for x in xs:
            log.info(json.dumps({"stage": stage, "value": repr(x)[:100]}))  # Structured, truncated
            yield x
    return tracer

def probe(stage: str, check_fn: Callable[[T], None]) -> Callable[[Iterable[T]], Iterator[T]]:
    def checker(xs: Iterable[T]) -> Iterator[T]:
        for x in xs:
            try:
                check_fn(x)
            except AssertionError as e:
                raise AssertionError(f"{stage}: {e}") from e
            yield x
    return checker

def identity(xs: Iterable[T]) -> Iterator[T]:
    yield from xs

Properties:
- Tee: Traces lazily, seals logs.
- Probe: Asserts lazily, raises on failure with stage.
- Identity: No-op for conditional debug.

Note: Ensure tap functions (like tee) don’t mutate or influence data—e.g., use lambda x: log.info(repr(x)) for observation only, preserving referential transparency. tee and probe are concrete implementations of the _tap idea from earlier cores: observe without altering.

4.2 Instrumentation Wrapper (Higher-Level)

from funcpipe_rag import StageInstrumentation, instrument_stage

# Wrap an iterable stage with optional tracing/probing.
wrapped = instrument_stage(
    stage,
    stage_name="stage_name",
    instrumentation=StageInstrumentation(trace=True, probe_fn=check_fn),
)

Properties:
- Safe: Uses getattr for name, no mutation.
- Composes: Adds trace/probe without rewriting.
- Lazy: Yields through.

4.2.1 Observability Without Breaking Laziness

To observe intermediates without materializing, use tee or probe in the flow. Here's a table of combinators for observability:

Combinator Use Case Example
tee Lazy tracing (logs) tee("docs")
probe Lazy assertions probe("chunks", check_chunk)
identity Conditional no-op identity if not debug else tee
instrument_stage Wrap existing stages with trace/probe instrument_stage(ffilter(...), instrumentation=StageInstrumentation(trace=True))

Property test for tee transparency:

from hypothesis import given
import hypothesis.strategies as st
from unittest.mock import MagicMock

@given(xs=st.lists(st.integers()))
def test_tee_transparent(xs):
    log_info = log.info  # Save original
    log.info = MagicMock()  # Mock for test
    tee_stage = tee("test")
    out = list(tee_stage(iter(xs)))
    log.info.assert_called()  # Called once per element
    assert out == xs  # Tee doesn't alter output
    log.info = log_info  # Restore

4.3 Refactored Core (Debuggable Pipeline)

from funcpipe_rag import DebugConfig, RagConfig, RagEnv, get_deps, iter_rag_core, structural_dedup_chunks

config = RagConfig(
    env=RagEnv(512),
    debug=DebugConfig(trace_docs=True, trace_chunks=True, probe_chunks=True),
)
deps = get_deps(config)

# `iter_rag_core` already wires `instrument_stage(...)` based on `config.debug`.
chunks = structural_dedup_chunks(iter_rag_core(docs, config, deps))

Properties:
- Conditional: Via config.debug (granular).
- Lazy: All stages yield through.
- Auditable: Structured logs with stage names; probes raise with context. Instrument traces/probes the outputs of each stage; the initial docs are traced by inserting a tee("docs") immediately after the producer.

4.4 Public API (Unchanged from M02C05–M02C08)

from funcpipe_rag import full_rag_api_docs, full_rag_api_path, get_deps

chunks, obs = full_rag_api_docs(docs, config, get_deps(config))
res = full_rag_api_path("arxiv_cs_abstracts_10k.csv", config, boundary_deps)

Properties:
- Keeps Result; boundaries unchanged.

4.5 Configurator Tie-In (M02C01)

from funcpipe_rag import DebugConfig, make_rag_fn

debug_rag_fn = make_rag_fn(
    chunk_size=512,
    debug=DebugConfig(trace_docs=True, probe_chunks=True),
)

Wins: Debug config flows like data; composes with partial.


5. Equational Reasoning: Substitution Exercise

Hand Exercise: Substitute in tee/probe.
1. Inline tee("docs") if ... else identity → fixed stage.
2. Substitute into flow → parametric trace.
3. Result: Pipeline fixed for fixed config (immutable); traces sealed.
Bug Hunt: In opaque version, no substitution reveals intermediates.

Example:
- Opaque: ffilter(partial(eval_pred)) → black box.
- Debug: instrument_stage with trace/probe → auditable, substitutable.


6. Property-Based Testing: Proving Debug Behaviour

Use Hypothesis verbose to trace failures.

6.1 Custom Strategy

From capstone/tests/conftest.py.

6.2 Debug Equivalence Property (No Debug)

# capstone/tests/test_rag_api.py (debug flags don't affect values)
from dataclasses import replace

from hypothesis import given

from funcpipe_rag import DebugConfig, RagConfig, get_deps, iter_rag_core
from tests.conftest import doc_list_strategy, env_strategy


@given(docs=doc_list_strategy(), env=env_strategy())
def test_debug_flags_do_not_change_values(docs, env):
    config = RagConfig(env=env)
    deps = get_deps(config)
    out1 = list(iter_rag_core(docs, config, deps))

    debug_cfg = replace(
        config,
        debug=DebugConfig(trace_docs=True, trace_chunks=True, probe_chunks=True),
    )
    out2 = list(iter_rag_core(docs, debug_cfg, get_deps(debug_cfg)))
    assert out1 == out2

Note: Debug off for pure equivalence; use separate verbose for tracing.

6.3 Probe Property (Invariants)

from hypothesis import settings, Verbosity

@settings(verbosity=Verbosity.verbose)
@given(docs=doc_list_strategy())
def test_probe_invariants(docs):
    config = RagConfig(env=RagEnv(512), debug=DebugConfig(probe_chunks=True))
    deps = get_deps(config)
    list(iter_rag_core(docs, config, deps))  # Probes raise on failure

Note: Verbose traces on failure; concrete invariant: assert isinstance(x, ChunkWithoutEmbedding). Test passes iff no probe assertion is raised.

6.4 Idempotence with Trace

@settings(verbosity=Verbosity.verbose)
@given(docs=doc_list_strategy(), env=env_strategy())
def test_debug_idempotence(docs, env):
    from funcpipe_rag import Ok, RagBoundaryDeps, full_rag_api_path

    class FakeReader:
        def __init__(self, docs):
            self._docs = docs

        def read_docs(self, path):
            _ = path
            return Ok(self._docs)

    config = RagConfig(env=env, debug=DebugConfig(trace_chunks=True))
    deps = RagBoundaryDeps(core=get_deps(config), reader=FakeReader(docs))
    res1 = full_rag_api_path("fake_path", config, deps)
    res2 = full_rag_api_path("fake_path", config, deps)
    assert res1 == res2

Note: Traces confirm no state.

6.5 Shrinking Demo: Catching a Bug

Bad probe with state (violates referential transparency):

from collections.abc import Callable, Iterable, Iterator
from typing import Any, TypeVar

from funcpipe_rag import ChunkWithoutEmbedding

T = TypeVar("T")

def bad_probe(stage: str, check_fn: Callable[[T], None]) -> Callable[[Iterable[T]], Iterator[T]]:
    counter = 0
    def checker(xs: Iterable[T]) -> Iterator[T]:
        nonlocal counter
        for x in xs:
            counter += 1
            if counter % 2 == 0:
                try:
                    check_fn(x)
                except AssertionError as e:
                    raise AssertionError(f"{stage}: {e}") from e
            yield x
    return checker

def check_chunk_without_embedding(x: Any) -> None:
    assert isinstance(x, ChunkWithoutEmbedding), "Invalid chunk type"
    assert x.start == 0, "Expected first chunk only (demo invariant)"

Property (intentionally failing example):

@settings(verbosity=Verbosity.verbose)
@given(docs=doc_list_strategy(), chunk_size=st.integers(128, 1024))
def test_bad_debug(docs, chunk_size):
    config = RagConfig(env=RagEnv(chunk_size), debug=DebugConfig(probe_chunks=True))
    deps = get_deps(config)
    pipeline = flow(
        lambda: docs,
        ffilter(lambda d: eval_pred(d, config.keep.keep_pred)),
        fmap(deps.cleaner),
        flatmap(lambda cd: gen_chunk_doc(cd, config.env)),
        bad_probe("chunks", check_chunk_without_embedding),
        fmap(deps.embedder),
    )
    list(pipeline())  # Consume to trigger probe

Failure Trace (Example):

Falsifying example: test_bad_debug(
    docs=[RawDoc(...), RawDoc(...)],  # Minimal pair triggering even/odd bug
    chunk_size=128,
)
AssertionError: chunks: Expected first chunk only (demo invariant)

Analysis: This fails on any input that produces more than one chunk. The point is not the invariant itself; it's that stateful probes make failures depend on enumeration order rather than just inputs.


7. When Debugging Stages Aren't Worth It

Use prints only in:
- Trivial one-step scripts.
- Legacy wrappers around stages.
Guardrails: Isolate to <5 lines; prefer stages for pipelines.

Example:

# Trivial
print(512)  # OK for one-off

8. Pre-Core Quiz

  1. print in mapper? → Purity violation.
  2. list(gen) inspect? → tee(stage).
  3. Anonymous lambda? → named("name", fn).
  4. Global debug flag? → Config + identity.
  5. Shrink failures? → Hypothesis verbose.

9. Post-Core Reflection & Exercise

Reflect: Find an opaque pipeline. Refactor with named, tee, probe; add verbose Hypothesis.
Project Exercise: Apply to RAG (e.g., trace decisions); run verbose properties.
- Did traces clarify bugs?
- Did probes catch invariants?
- Did verbose shrink failures?

Continue with: Imperative to FP Refactor

Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.

Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.