Skip to content

Effect Boundaries

Page Maps

graph LR
  family["Python Programming"]
  program["Python Functional Programming"]
  section["Data First Apis Expression Style"]
  page["Effect Boundaries"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone
flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

This lesson is where the module stops being purely about elegant code and starts being about trustworthy system structure. "Pure core, effectful edge" is not a slogan. It is a concrete answer to the question of where file access, network access, logging, and failures belong.

Start With the Boundary Mistake

The most common mistake here is not using effects. It is letting them sneak into the middle of code you still want to test and rewrite as if it were pure.

  • If a core helper opens files or catches broad exceptions, the boundary is in the wrong place.
  • If tests must patch built-ins or global services, dependencies are still hidden.
  • If reviewers cannot swap a fake implementation without changing the algorithm, the design is not yet sealed.

Keep This Question In View

Core question:
How do you isolate all side effects (I/O, mutation, exceptions) to thin, explicit boundaries—so the core stays parametric over effects, composable, and equational while handling real-world I/O?

This lesson introduces boundary design as a practical architecture habit:

  • keep side effects inside thin implementations that can be named, replaced, and tested
  • pass those implementations through dependencies so the core can stay focused on values
  • prefer explicit result values over invisible exception paths in the core layer

The running project makes the lesson concrete: the same RAG pipeline should work with fake boundaries in tests and real boundaries in production without changing the core logic.

Use this when you have small-arity APIs but still let file reads, exceptions, or mutable state leak into the core. Outcome:
1. Identify effect leaks (I/O, raises) in code and explain their impact on reasoning.
2. Refactor a leaky function into parametric core + thin boundary with injected deps.
3. Write Hypothesis properties proving parametricity (equivalence, idempotence), with a shrinking example.

Note: This core anticipates Module 7's Ports & Adapters—start isolating I/O now by wrapping any file/network calls in thin functions.

Result Preview: In this core we only care about where I/O happens, not about advanced error algebra. We define a minimal Result[T] = Ok[T] | Err with Err always carrying a str. In Module 4 we generalize this to a fully-typed Result[T, E] with laws and a richer API. For now, treat it as a way to handle errors without exceptions: check isinstance(res, Ok) to get the value or isinstance(res, Err) to get the error.


1. Conceptual Foundation

1.1 Boundary Design in One Precise Sentence

Boundary design isolates side effects to thin implementations injected via protocols in deps—ensuring the core remains parametric over pure or effectful services, composable via M02C01–M02C04, while effects are testable and replaceable.

1.2 The One-Sentence Rule

Confine side effects to boundary implementations (e.g., FSReader); inject them as deps so the core stays parametric and testable—never hardcode effects, raises, or mutation in core functions.

1.3 Why This Matters Now

Small, explicit APIs are only half the story. If the code behind them still performs I/O directly, the boundary remains muddy and reasoning becomes conditional on the runtime environment. This lesson finishes the separation: APIs name the dependencies, and boundaries decide how the outside world gets involved.

1.4 Boundaries as Values in 5 Lines

The next example matters because it shows a boundary implementation being treated as a replaceable value, not as hardwired behavior.

from dataclasses import dataclass
from collections.abc import Callable

from funcpipe_rag import FSReader, Ok, RawDoc, Result


@dataclass(frozen=True)
class FakeReader:
    docs: list[RawDoc]

    def read_docs(self, path: str) -> Result[list[RawDoc]]:
        _ = path
        return Ok(self.docs)


ReaderFn = Callable[[str], Result[list[RawDoc]]]
readers: dict[str, ReaderFn] = {
    "fake": FakeReader([RawDoc("test", "title", "abstract", "cat")]).read_docs,
    "real": FSReader().read_docs,
}

Thin boundaries (protocols), explicit injection via deps, and parametric core allow swapping implementations (pure fakes or effectful reals) without changing core logic. In practice, you may store boundary implementations in registries like readers, then inject the chosen implementation into RagBoundaryDeps.

Note: Core is parametric: pure if deps are pure (e.g., fake embedder), effectful if deps perform I/O. Iterators defer computation; if deps are effectful, consumption performs effects.


2. Mental Model: Leaky Effects vs Sealed Boundaries

2.1 One Picture

Leaky Effects (Chaotic)                     Sealed Boundaries (Parametric)
+---------------------------+               +------------------------------+
| def rag(docs_path):       |               | def iter_rag_core(docs,      |
|     docs = open(...)      |               | config, deps)                |
|     # I/O in core!        |               | -> Iterator[Chunk]           |
|     return process(docs)  |               | # Parametric over deps       |
+---------------------------+               +------------------------------+
   ↑ Flaky Reasoning                          ↑ Effects via Injected Deps

2.2 Contract Table

Aspect Leaky Effects Sealed Boundaries
Parametricity Hardcoded effects Core parametric over deps
Dependencies Hidden I/O globals Explicit protocols in deps
Composability Flaky (side effects) Easy (pure flow/partial)
Testing Mock globals, integration Unit pure, fake deps
Boundaries Scattered Thin implementations
Reasoning Opaque (hidden effects) Equational (substitutable)

Note on Leaky Choice: Use leaks only in trivial scripts; always seal for reuse.


3. Running Project: FuncPipe RAG Builder

We extend the FuncPipe RAG Builder from m02-rag.md:
- Dataset: 10k arXiv CS abstracts (arxiv_cs_abstracts_10k.csv).
- Goal: Isolate I/O (file loading) to boundaries, keeping core parametric.
- Start: Leaky version with I/O in core (core5_start.py).
- End: Injected boundaries, preserving equivalence.

3.1 Types (Canonical, Used Throughout)

Extend M02C04 with effect protocols in deps:

from funcpipe_rag import Err, Ok, Reader, Result
from funcpipe_rag import RagBoundaryDeps, RagConfig, RagCoreDeps

Note: M02C05 extends deps with reader for boundaries; core functions ignore reader.

3.2 Leaky Start (Anti-Pattern)

# core5_start.py: Leaky RAG with I/O in core (anti-pattern; illustration only)
from funcpipe_rag import Observations, RagConfig, RagCoreDeps, RagEnv
from funcpipe_rag import Chunk, RawDoc, clean_doc, embed_chunk, iter_rag_core, structural_dedup_chunks
import csv


def leaky_full_rag_api(
        path: str,
        config: RagConfig,
        deps: RagCoreDeps
) -> tuple[list[Chunk], Observations]:
    try:
        with open(path) as f:  # Leaky I/O in "core"!
            reader = csv.DictReader(f)
            docs = [RawDoc(**row) for row in reader]
    except Exception as e:
        raise ValueError(f"Load failed: {e}")  # Leaky exception
    chunks_iter = iter_rag_core(docs, config, deps)  # From M02C04
    chunks = list(chunks_iter)
    chunks = structural_dedup_chunks(chunks)
    obs = Observations(total_docs=len(docs), total_chunks=len(chunks))  # Simplified
    return chunks, obs

Smells:
- I/O (open) in API, not boundary.
- Exceptions for control flow.
- Mixed parametric/streaming with effects.
Problem: Breaks parametricity; hard to test without real files.


4. Refactor to Boundaries: Parametric Core + Injected Implementations

4.1 Streaming Core (Parametric over Deps)

Canonical M02C04 core (repeated for reference):

from funcpipe_rag import RagConfig, get_deps, iter_rag_core

deps = get_deps(config)
chunks_iter = iter_rag_core(docs, config, deps)

Properties:
- Arity 3: Parametric; pure if deps pure.
- Lazy: Builds on M02C03.
- Deps may be effectful (e.g., real embedder performs I/O).

4.2 Post-Clean Streaming Sub-Core

Internal sub-core:

from funcpipe_rag import iter_chunks_from_cleaned

chunks_iter = iter_chunks_from_cleaned(cleaned, config, deps.embedder)

Properties:
- Arity 3: Parametric, reusable.

4.3 I/O Boundary Implementations (Thin, Injected)

Explicit reader implementations:

from funcpipe_rag import FSReader, Ok, RawDoc, Result


class FakeReader:
    def __init__(self, docs: list[RawDoc]):
        self._docs = docs

    def read_docs(self, path: str) -> Result[list[RawDoc]]:
        _ = path
        return Ok(self._docs)

Properties:
- Thin: Single responsibility.
- Result: Explicit errors.
- Injected via deps.reader.

4.4 Public API (Edge, Composes Boundaries)

Orchestrates implementation + core:

from funcpipe_rag import FSReader, RagBoundaryDeps, full_rag_api_docs, full_rag_api_path

chunks, obs = full_rag_api_docs(docs, config, deps)
boundary_deps = RagBoundaryDeps(core=deps, reader=FSReader())
res = full_rag_api_path("arxiv_cs_abstracts_10k.csv", config, boundary_deps)

Properties:
- Arity 3: Effects in implementations (e.g., FSReader).
- Uses simple isinstance for Result handling.
- Matches the baseline stage composition on Ok.

Layers:
- Core (library, parametric, streaming): iter_rag_core.
- Sub-core (internal helper): iter_chunks_from_cleaned.
- Boundary/Edge (CLI/API, effectful): full_rag_api_path (path in, Result out).

4.5 Configurator Tie-In (M02C01)

from functools import partial
from funcpipe_rag import Chunk, ChunkWithoutEmbedding, DebugConfig, RagBoundaryDeps, RagConfig, RagCoreDeps, RagEnv
from funcpipe_rag import FSReader, Ok, RulesConfig, StartsWith, full_rag_api_path, get_deps, make_rag_fn


def fake_embedder(c: ChunkWithoutEmbedding) -> Chunk:
    return Chunk(c.doc_id, c.text, c.start, c.end, (0.0,) * 16)  # Fake embedding


# Docs API (preferred): configure a docs -> (chunks, obs) callable
rag_docs_fn = make_rag_fn(chunk_size=512)

# Boundary API: configure boundary deps and call `full_rag_api_path`
config = RagConfig(env=RagEnv(512), debug=DebugConfig())
boundary_deps = RagBoundaryDeps(core=get_deps(config), reader=FSReader())
rag_path_fn = partial(full_rag_api_path, config=config, deps=boundary_deps)

# Fake boundary: swap reader/embedder for tests
keep_all_cs = RulesConfig(keep_pred=StartsWith("categories", "cs."))
test_config = RagConfig(env=RagEnv(512), keep=keep_all_cs)
fake_boundary_deps = RagBoundaryDeps(
    core=RagCoreDeps(cleaner=get_deps(test_config).cleaner, embedder=fake_embedder, taps=None),
    reader=FakeReader([]),
)
test_rag_path_fn = partial(full_rag_api_path, config=test_config, deps=fake_boundary_deps)

Wins: Implementations injectable; fakes make core pure. Composes with M02C01.


5. Equational Reasoning: Substitution Exercise

Hand Exercise: Substitute in iter_rag_core.
1. Inline embedder = deps.embedder → fixed function.
2. Substitute into generator → parametric stream.
3. Result: Output fixed for fixed inputs/deps (parametric).
Bug Hunt: In leaky version, open breaks substitution (effects change behavior).

Example:
- Leaky: with open(...) → depends on FS, not substitutable.
- Sealed: deps.reader.read_docs(path) → injectable, substitutable with fake implementation.


6. Property-Based Testing: Proving Parametricity (Advanced, Optional)

Use Hypothesis to prove refactor preserves behavior with parametric deps.

6.1 Custom Strategy

From capstone/tests/conftest.py.

6.2 Core Equivalence Property

# capstone/tests/test_rag_api.py
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import (
    RagConfig,
    RagEnv,
    RagCoreDeps,
    RagBoundaryDeps,
    Err,
    Ok,
    FSReader,
    clean_doc,
    embed_chunk,
    iter_chunk_doc,
    structural_dedup_chunks,
    iter_rag_core,
    full_rag_api_path,
)
from tests.conftest import doc_list_strategy, env_strategy
from itertools import islice

def baseline_full_rag(docs, env):
    embedded = [embed_chunk(c) for d in docs for c in iter_chunk_doc(clean_doc(d), env)]
    return structural_dedup_chunks(embedded)

@given(docs=doc_list_strategy(), env=env_strategy())
def test_core_equivalence(docs, env):
    config = RagConfig(env=env)
    deps = RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk)
    core_iter = iter_rag_core(iter(docs), config, deps)
    assert list(core_iter) == baseline_full_rag(docs, env)

Note: Tests parametric core equivalence to the baseline (no boundaries).

6.3 Prefix Equivalence (Streaming Core)

@given(docs=doc_list_strategy(), env=env_strategy(), k=st.integers(0, 50))
def test_core_prefix_equivalence(docs, env, k):
    config = RagConfig(env=env)
    deps = RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk)
    core_iter = iter_rag_core(iter(docs), config, deps)
    assert list(islice(core_iter, k)) == baseline_full_rag(docs, env)[:k]

Note: Verifies parametric core streaming matches the baseline.

6.4 Boundary Error Handling

def test_boundary_failure():
    config = RagConfig(env=RagEnv(512))
    deps = RagBoundaryDeps(RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk, taps=None), FSReader())
    res = full_rag_api_path("nonexistent.csv", config, deps)
    assert isinstance(res, Err)
    assert "Load failed" in res.error

Note: Tests boundary implementation returns Err on I/O error.

6.5 Idempotence Property (Boundary with Fake Implementation)

@given(env=env_strategy())
def test_rag_idempotence(env):
    from funcpipe_rag import Chunk, ChunkWithoutEmbedding, Ok, RawDoc, Result

    class FakeReader:
        def read_docs(self, path: str) -> Result[list[RawDoc]]:
            _ = path
            return Ok([])

    def fake_embedder(c: ChunkWithoutEmbedding) -> Chunk:
        return Chunk(c.doc_id, c.text, c.start, c.end, (0.0,) * 16)

    config = RagConfig(env=env)
    deps = RagBoundaryDeps(
        RagCoreDeps(cleaner=clean_doc, embedder=fake_embedder, taps=None),
        FakeReader(),
    )
    res1 = full_rag_api_path("fake_path", config, deps)
    res2 = full_rag_api_path("fake_path", config, deps)
    assert res1 == res2

Note: Ensures no hidden state with faked implementations (pure deps).

6.6 Shrinking Demo: Catching a Leaky Bug

Bad reader with leaky state:

from funcpipe_rag import Ok, RawDoc, Result


class BadReader:
    counter = 0

    def read_docs(self, path: str) -> Result[list[RawDoc]]:
        BadReader.counter += 1  # Leaky mutation
        if BadReader.counter % 2 == 0:
            return Ok([])
        return Ok([RawDoc("cs-123", "Title", "Abstract", "cs.AI")])

Property:

@given(env=env_strategy())
def test_bad_rag_idempotence(env):
    config = RagConfig(env=env)
    deps = RagBoundaryDeps(RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk, taps=None), BadReader())
    res1 = full_rag_api_path("fake_path", config, deps)
    res2 = full_rag_api_path("fake_path", config, deps)
    assert res1 == res2

Failure Trace (Example):

Falsifying example: test_bad_rag_idempotence(
    env=RagEnv(chunk_size=128),
)
AssertionError

Analysis: Shrinks to minimal; catches leaky counter changing output between calls.


7. When Boundaries Aren't Worth It

Use leaks only in:
- Trivial one-off scripts (no reuse).
- Legacy wrappers around sealed cores.
Guardrails: Isolate leaks to <10 lines; always prefer boundaries for tests and reuse.

Example:

import json
# Trivial script
print(json.loads(open("data.json").read()))  # OK for one-off

8. Pre-Core Quiz

  1. open() in core? → Violates parametricity.
  2. raise ValueError? → Use Result.
  3. How to test I/O? → Fake implementation.
  4. Effects in generator? → Inject implementation.
  5. Prove parametricity? → Hypothesis idempotence.

9. Post-Core Reflection & Exercise

Reflect: Find a function with I/O or raises. Refactor to parametric core + implementation; inject fake. Add Hypothesis for equivalence/idempotence.
Project Exercise: Apply to RAG (e.g., load_docs as boundary); run properties.
- Did parametricity enable easier tests?
- Did fakes catch leaks?
- Did boundaries clarify effects?

Continue with: Configuration as Data

Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.

Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.