Debugging Compositions¶
Page Maps¶
graph LR
family["Python Programming"]
program["Python Functional Programming"]
section["Data First Apis Expression Style"]
page["Debugging Compositions"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
Debugging does not require abandoning the design principles from the rest of the module. The goal is to observe the pipeline honestly, not to punch ad-hoc holes in it whenever something feels mysterious.
Start With the Debugging Trap¶
The common trap is easy to recognize: once a composed pipeline feels opaque, it is tempting to insert prints, eager list(...) calls, and mutable debug switches. Those moves reveal something, but they also distort the code you are trying to understand.
- If debugging changes evaluation order or forces materialization, it is changing more than visibility.
- If log statements are mixed into core transforms, the lesson boundary between pure logic and effects is disappearing.
- If reviewers cannot tell which debug behavior is temporary and which is part of the design, the instrumentation is too ad hoc.
Keep This Question In View¶
Core question:
How do you debug pure, composed FP pipelines without scattering prints or breaking laziness—so "it works in my head but not in prod" becomes structured traces, reproducible probes, and self-documenting names that turn black-box flows into auditable glass?
This lesson introduces debugging as explicit pipeline design:
- name stages so the pipeline is readable even before a bug appears
- use explicit trace or probe stages when observation is needed
- keep debugging support composable so it can be enabled, disabled, and reviewed without rewriting the core
The running project keeps the lesson grounded: debugging support should help explain chunk flow and decision points without flattening the lazy pipeline into a debugging script.
Use this when you have DSL-driven pipelines but still debug with prints, breakpoints, or forced eager evaluation.
Outcome:
1. Spot debug smells (prints in core, mutable flags, eager list()) and explain their impact on purity.
2. Refactor an opaque pipeline to include named functions, tee traces, and probe assertions.
3. Write Hypothesis properties with verbose tracing to debug failures, including a shrinking example.
1. Conceptual Foundation¶
1.1 Debugging FP Code in One Precise Sentence¶
Debugging FP code treats naming, probing (tee/probe stages), and tracing (structured logs + Hypothesis) as explicit, composable operations—so pipelines remain lazy and reproducible while revealing every intermediate step.
1.2 The One-Sentence Rule¶
Never smuggle ad-hoc prints or breakpoints into core transforms; isolate debug effects in explicit tee/probe stages bound by boundary config—debug like sealed data.
1.3 Why This Matters Now¶
Once a pipeline becomes more declarative, a new fear appears: "it is cleaner, but how do I see what is happening?" This page answers that fear directly. Do not go backward to print-driven debugging. Add observation points that respect the pipeline structure you have already built.
1.4 Debugging as Values in 5 Lines¶
The next snippet matters because the tracing behavior is packaged as a reusable stage instead of being threaded through every transformation.
from collections.abc import Callable, Iterable, Iterator
from typing import TypeVar
import logging
import json
T = TypeVar("T")
log = logging.getLogger(__name__)
def tee(stage: str) -> Callable[[Iterable[T]], Iterator[T]]:
def tracer(xs: Iterable[T]) -> Iterator[T]:
for x in xs:
log.info(json.dumps({"stage": stage, "value": repr(x)[:100]})) # Structured, truncated
yield x
return tracer
Tee stages, bound via partial if needed, allow storage in dicts, composition with M02C01, and lazy tracing—explicit and configurable.
Note: Use structured JSON logs in production; effects sealed in tee. In production, keep trace_* off by default and only enable for targeted runs; tee is intentionally heavyweight and should not be permanently hot in tight loops. Adapt the logging shape (e.g., repr(x)[:100]) for your own domains since repr can be large or noisy.
2. Mental Model: Print Hell vs Explicit Stages¶
2.1 One Picture¶
Print Hell (Impure) Explicit Stages (Sealed)
+-----------------------+ +------------------------------+
| def rag(gen): | | flow( |
| for x in gen: | | producer, |
| print(x) | | tee("input"), |
| yield process(x)| | named("process", process), |
+-----------------------+ | probe("valid_chunk", assert_is_chunk),|
↑ Breaks Purity/Laziness | tee("output") |
| )() |
+------------------------------+
↑ Lazy, Auditable
2.2 Contract Table¶
| Aspect | Print Hell | Explicit Stages |
|---|---|---|
| Purity | Leaky effects | Sealed in tee |
| Laziness | Often eager | Yields through |
| Readability | Scattered prints | Named + probed stages |
| Reproducibility | Manual runs | Hypothesis verbose |
| Configurability | Global flags | Boundary config + identity |
| Mutable Defaults in Partials | Breaks Determinism | Use frozen dataclasses or immutable types for configs |
Note on Print Choice: Use prints only in trivial scripts; always prefer explicit stages for pipelines.
3. Running Project: FuncPipe RAG Builder¶
We extend the FuncPipe RAG Builder from m02-rag.md:
- Dataset: 10k arXiv CS abstracts (arxiv_cs_abstracts_10k.csv).
- Goal: Add debugging to combinator pipelines without breaking laziness.
- Start: Opaque version (core9_start.py).
- End: Debuggable pipeline with named stages, tees, and probes.
3.1 Types (Canonical, Used Throughout)¶
Extend with debug config (full, runnable with imports):
3.2 Flow Contract (Recall from M02C07)¶
Recall the flow contract from M02C07: Producer starts; transformers chain lazily. Types approximate the structure; in practice, use mypy for checks.
from typing import Any, Callable, Iterable, Protocol, TypeVar
T = TypeVar("T")
U = TypeVar("U")
class Producer(Protocol[T]):
def __call__(self) -> Iterable[T]: ...
class Stage(Protocol[T, U]):
def __call__(self, xs: Iterable[T]) -> Iterable[U]: ...
def flow(prod: Producer[T], *stages: Stage[Any, Any]) -> Callable[[], Iterable[Any]]:
def run() -> Iterable[Any]:
data: Iterable[Any] = prod()
for s in stages:
data = s(data)
return data
return run
3.3 Opaque Start (Anti-Pattern)¶
from funcpipe_rag import (
DebugConfig,
Observations,
RagConfig,
RagEnv,
eval_pred,
ffilter,
flatmap,
flow,
fmap,
gen_chunk_doc,
get_deps,
structural_dedup_chunks,
)
def opaque_run_core_on_docs(docs, chunk_size):
config = RagConfig(env=RagEnv(chunk_size), debug=DebugConfig())
deps = get_deps(config)
pipeline = flow(
lambda: docs,
ffilter(lambda d: eval_pred(d, config.keep.keep_pred)),
fmap(deps.cleaner),
flatmap(lambda cd: gen_chunk_doc(cd, config.env)),
fmap(deps.embedder),
)
chunks = structural_dedup_chunks(pipeline())
obs = Observations(total_docs=len(docs), total_chunks=len(chunks))
return chunks, obs
Smells:
- Anonymous lambdas (e.g., lambda cd: gen_chunk_doc).
- No traces or probes.
- Hard to audit decisions.
4. Refactor to Debuggable: Naming + Tee + Probe¶
4.1 Debug Primitives (Explicit Stages)¶
from collections.abc import Callable, Iterable, Iterator
from typing import TypeVar
import logging
import json
T = TypeVar("T")
log = logging.getLogger(__name__)
def tee(stage: str) -> Callable[[Iterable[T]], Iterator[T]]:
def tracer(xs: Iterable[T]) -> Iterator[T]:
for x in xs:
log.info(json.dumps({"stage": stage, "value": repr(x)[:100]})) # Structured, truncated
yield x
return tracer
def probe(stage: str, check_fn: Callable[[T], None]) -> Callable[[Iterable[T]], Iterator[T]]:
def checker(xs: Iterable[T]) -> Iterator[T]:
for x in xs:
try:
check_fn(x)
except AssertionError as e:
raise AssertionError(f"{stage}: {e}") from e
yield x
return checker
def identity(xs: Iterable[T]) -> Iterator[T]:
yield from xs
Properties:
- Tee: Traces lazily, seals logs.
- Probe: Asserts lazily, raises on failure with stage.
- Identity: No-op for conditional debug.
Note: Ensure tap functions (like tee) don’t mutate or influence data—e.g., use lambda x: log.info(repr(x)) for observation only, preserving referential transparency. tee and probe are concrete implementations of the _tap idea from earlier cores: observe without altering.
4.2 Instrumentation Wrapper (Higher-Level)¶
from funcpipe_rag import StageInstrumentation, instrument_stage
# Wrap an iterable stage with optional tracing/probing.
wrapped = instrument_stage(
stage,
stage_name="stage_name",
instrumentation=StageInstrumentation(trace=True, probe_fn=check_fn),
)
Properties:
- Safe: Uses getattr for name, no mutation.
- Composes: Adds trace/probe without rewriting.
- Lazy: Yields through.
4.2.1 Observability Without Breaking Laziness¶
To observe intermediates without materializing, use tee or probe in the flow. Here's a table of combinators for observability:
| Combinator | Use Case | Example |
|---|---|---|
| tee | Lazy tracing (logs) | tee("docs") |
| probe | Lazy assertions | probe("chunks", check_chunk) |
| identity | Conditional no-op | identity if not debug else tee |
| instrument_stage | Wrap existing stages with trace/probe | instrument_stage(ffilter(...), instrumentation=StageInstrumentation(trace=True)) |
Property test for tee transparency:
from hypothesis import given
import hypothesis.strategies as st
from unittest.mock import MagicMock
@given(xs=st.lists(st.integers()))
def test_tee_transparent(xs):
log_info = log.info # Save original
log.info = MagicMock() # Mock for test
tee_stage = tee("test")
out = list(tee_stage(iter(xs)))
log.info.assert_called() # Called once per element
assert out == xs # Tee doesn't alter output
log.info = log_info # Restore
4.3 Refactored Core (Debuggable Pipeline)¶
from funcpipe_rag import DebugConfig, RagConfig, RagEnv, get_deps, iter_rag_core, structural_dedup_chunks
config = RagConfig(
env=RagEnv(512),
debug=DebugConfig(trace_docs=True, trace_chunks=True, probe_chunks=True),
)
deps = get_deps(config)
# `iter_rag_core` already wires `instrument_stage(...)` based on `config.debug`.
chunks = structural_dedup_chunks(iter_rag_core(docs, config, deps))
Properties:
- Conditional: Via config.debug (granular).
- Lazy: All stages yield through.
- Auditable: Structured logs with stage names; probes raise with context. Instrument traces/probes the outputs of each stage; the initial docs are traced by inserting a tee("docs") immediately after the producer.
4.4 Public API (Unchanged from M02C05–M02C08)¶
from funcpipe_rag import full_rag_api_docs, full_rag_api_path, get_deps
chunks, obs = full_rag_api_docs(docs, config, get_deps(config))
res = full_rag_api_path("arxiv_cs_abstracts_10k.csv", config, boundary_deps)
Properties:
- Keeps Result; boundaries unchanged.
4.5 Configurator Tie-In (M02C01)¶
from funcpipe_rag import DebugConfig, make_rag_fn
debug_rag_fn = make_rag_fn(
chunk_size=512,
debug=DebugConfig(trace_docs=True, probe_chunks=True),
)
Wins: Debug config flows like data; composes with partial.
5. Equational Reasoning: Substitution Exercise¶
Hand Exercise: Substitute in tee/probe.
1. Inline tee("docs") if ... else identity → fixed stage.
2. Substitute into flow → parametric trace.
3. Result: Pipeline fixed for fixed config (immutable); traces sealed.
Bug Hunt: In opaque version, no substitution reveals intermediates.
Example:
- Opaque: ffilter(partial(eval_pred)) → black box.
- Debug: instrument_stage with trace/probe → auditable, substitutable.
6. Property-Based Testing: Proving Debug Behaviour¶
Use Hypothesis verbose to trace failures.
6.1 Custom Strategy¶
From capstone/tests/conftest.py.
6.2 Debug Equivalence Property (No Debug)¶
# capstone/tests/test_rag_api.py (debug flags don't affect values)
from dataclasses import replace
from hypothesis import given
from funcpipe_rag import DebugConfig, RagConfig, get_deps, iter_rag_core
from tests.conftest import doc_list_strategy, env_strategy
@given(docs=doc_list_strategy(), env=env_strategy())
def test_debug_flags_do_not_change_values(docs, env):
config = RagConfig(env=env)
deps = get_deps(config)
out1 = list(iter_rag_core(docs, config, deps))
debug_cfg = replace(
config,
debug=DebugConfig(trace_docs=True, trace_chunks=True, probe_chunks=True),
)
out2 = list(iter_rag_core(docs, debug_cfg, get_deps(debug_cfg)))
assert out1 == out2
Note: Debug off for pure equivalence; use separate verbose for tracing.
6.3 Probe Property (Invariants)¶
from hypothesis import settings, Verbosity
@settings(verbosity=Verbosity.verbose)
@given(docs=doc_list_strategy())
def test_probe_invariants(docs):
config = RagConfig(env=RagEnv(512), debug=DebugConfig(probe_chunks=True))
deps = get_deps(config)
list(iter_rag_core(docs, config, deps)) # Probes raise on failure
Note: Verbose traces on failure; concrete invariant: assert isinstance(x, ChunkWithoutEmbedding). Test passes iff no probe assertion is raised.
6.4 Idempotence with Trace¶
@settings(verbosity=Verbosity.verbose)
@given(docs=doc_list_strategy(), env=env_strategy())
def test_debug_idempotence(docs, env):
from funcpipe_rag import Ok, RagBoundaryDeps, full_rag_api_path
class FakeReader:
def __init__(self, docs):
self._docs = docs
def read_docs(self, path):
_ = path
return Ok(self._docs)
config = RagConfig(env=env, debug=DebugConfig(trace_chunks=True))
deps = RagBoundaryDeps(core=get_deps(config), reader=FakeReader(docs))
res1 = full_rag_api_path("fake_path", config, deps)
res2 = full_rag_api_path("fake_path", config, deps)
assert res1 == res2
Note: Traces confirm no state.
6.5 Shrinking Demo: Catching a Bug¶
Bad probe with state (violates referential transparency):
from collections.abc import Callable, Iterable, Iterator
from typing import Any, TypeVar
from funcpipe_rag import ChunkWithoutEmbedding
T = TypeVar("T")
def bad_probe(stage: str, check_fn: Callable[[T], None]) -> Callable[[Iterable[T]], Iterator[T]]:
counter = 0
def checker(xs: Iterable[T]) -> Iterator[T]:
nonlocal counter
for x in xs:
counter += 1
if counter % 2 == 0:
try:
check_fn(x)
except AssertionError as e:
raise AssertionError(f"{stage}: {e}") from e
yield x
return checker
def check_chunk_without_embedding(x: Any) -> None:
assert isinstance(x, ChunkWithoutEmbedding), "Invalid chunk type"
assert x.start == 0, "Expected first chunk only (demo invariant)"
Property (intentionally failing example):
@settings(verbosity=Verbosity.verbose)
@given(docs=doc_list_strategy(), chunk_size=st.integers(128, 1024))
def test_bad_debug(docs, chunk_size):
config = RagConfig(env=RagEnv(chunk_size), debug=DebugConfig(probe_chunks=True))
deps = get_deps(config)
pipeline = flow(
lambda: docs,
ffilter(lambda d: eval_pred(d, config.keep.keep_pred)),
fmap(deps.cleaner),
flatmap(lambda cd: gen_chunk_doc(cd, config.env)),
bad_probe("chunks", check_chunk_without_embedding),
fmap(deps.embedder),
)
list(pipeline()) # Consume to trigger probe
Failure Trace (Example):
Falsifying example: test_bad_debug(
docs=[RawDoc(...), RawDoc(...)], # Minimal pair triggering even/odd bug
chunk_size=128,
)
AssertionError: chunks: Expected first chunk only (demo invariant)
Analysis: This fails on any input that produces more than one chunk. The point is not the invariant itself; it's that stateful probes make failures depend on enumeration order rather than just inputs.
7. When Debugging Stages Aren't Worth It¶
Use prints only in:
- Trivial one-step scripts.
- Legacy wrappers around stages.
Guardrails: Isolate to <5 lines; prefer stages for pipelines.
Example:
8. Pre-Core Quiz¶
printin mapper? → Purity violation.list(gen)inspect? → tee(stage).- Anonymous lambda? → named("name", fn).
- Global debug flag? → Config + identity.
- Shrink failures? → Hypothesis verbose.
9. Post-Core Reflection & Exercise¶
Reflect: Find an opaque pipeline. Refactor with named, tee, probe; add verbose Hypothesis.
Project Exercise: Apply to RAG (e.g., trace decisions); run verbose properties.
- Did traces clarify bugs?
- Did probes catch invariants?
- Did verbose shrink failures?
Continue with: Imperative to FP Refactor
Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.
Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.