Isolating Side Effects¶

Page Maps¶

graph LR
  family["Python Programming"]
  program["Python Functional Programming"]
  section["Purity Substitution Local Reasoning"]
  page["Isolating Side Effects"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

This lesson is about drawing one boundary clearly: the pure core decides what should happen, and the thin shell performs the effect.

Start With the Operational Smell¶

You usually need this lesson when code says it is "just transforming data" but also:

reads environment variables or global config
logs, prints, or writes to disk in the middle of the transform
calls time, random, database, or network helpers directly

At that point the function is doing two jobs: deciding and performing. This page is about splitting those jobs cleanly.

Keep This Question In View¶

How do you eliminate hidden side effects by passing all dependencies explicitly so that pure logic stays testable and composable?

By the end of this lesson, you should be able to point at a function and say:

what belongs in the pure core
what belongs in the shell
what dependency should be explicit input instead of ambient state

1. Conceptual Foundation¶

1.1 The One-Sentence Rule¶

Never touch globals, env, time, or RNG directly; pass everything explicitly via frozen context bundles—one per layer.

1.2 Explicit Dependencies in One Precise Sentence¶

Explicit dependencies mean every effectful operation receives its capabilities via frozen context objects—so the pure core remains deterministic and testable, while thin shells handle I/O, logging, and state.

1.3 Why This Matters Now¶

Explicit dependencies matter because they keep meaning and execution separate. Once the effectful capabilities are passed in explicitly, the core logic can be reviewed and tested as a deterministic transform again.

1.4 How This Relates to DI / Ports & Adapters / Clean Architecture¶

This approach lines up with familiar architecture patterns:

Dependency Injection (DI): Passing Env bundles is manual DI—simple and zero-deps.
Ports & Adapters: Pure core is the domain; shells are adapters for effects (I/O, time).
Clean Architecture: Core is entities/use-cases (pure); shells are interfaces/infra (effects).

We keep it lightweight: explicit parameters or small frozen contexts instead of heavy framework machinery.

1.5 Purity Spectrum Table¶

Level	Description	Example
Fully Pure	Explicit inputs/outputs only	`def add(x: int, y: int) -> int: return x + y`
Semi-Pure	Observational taps (e.g., logging)	`def add_with_log(x: int, y: int) -> int: log(f"Adding {x}+{y}"); return x + y`
Impure	Globals/I/O/mutation	`def read_file(path: str) -> str: ...`

In this lesson, even logging is treated as an effect that should be explicit when it matters to reviewability.

2. Mental Model: Hidden Effects vs Explicit Context¶

2.1 One Picture¶

Hidden effects                             Explicit boundary
+---------------------------+             +-----------------------------+
| globals / env / time      |             | core(data, cfg) -> value    |
| random / print / I/O      |             | shell(env, input) performs  |
| mixed into core logic     |             | effect and calls the core   |
| review surface is muddy   |             | review surface is clear     |
+---------------------------+             +-----------------------------+

2.2 Contract Table¶

Clause	Violation Example	Detected By
Explicit dependencies	`os.getenv`, `datetime.now()`	Tests with frozen context
No hidden prints	`print` inside pure logic	Code review + linter
Determinism when fixed	Same inputs+deps → same outputs	Tests with frozen context
Mockable effects	Direct DB calls	Unit tests with fake Env
Edge isolation	Effects in pipeline middle	Code review + linter

Note on Contracts: the phrase "thin shell" only helps if the shell really is thin. If half the business logic still lives next to the effect, the boundary is still muddy.

3. Running Project: Extracting Effects in RAG¶

Our running project (from module-01/funcpipe-rag-01/README.md) isolates effects in Core 7's typed pipelines.
- Goal: Push I/O, logging, time/RNG to edges.
- Start: Core 1-7's typed pure functions.
- End (this core): Pure core with explicit values; effects in shell. Semantics aligned with Core 1-7.

3.1 Types (Canonical)¶

These are defined in module-01/funcpipe-rag-01/src/funcpipe_rag/rag_types.py (as in Core 1) and imported as needed. No redefinition here.

3.2 Effectful Variants (Anti-Patterns in RAG)¶

Full code:

from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
import hashlib
from datetime import datetime
import random
import logging

# Before refactors: implicit logging, time, RNG inside the pipeline
LOG = logging.getLogger("rag")


def effectful_clean_doc(doc: RawDoc) -> CleanDoc:
    abstract = " ".join(doc.abstract.strip().lower().split())
    LOG.info("Cleaned doc %s", doc.doc_id)
    return CleanDoc(doc.doc_id, doc.title, abstract, doc.categories)


def effectful_chunk_doc(doc: CleanDoc, env: RagEnv) -> list[ChunkWithoutEmbedding]:
    text = doc.abstract
    chunks = [
        ChunkWithoutEmbedding(doc.doc_id, text[i:i + env.chunk_size], i, i + len(text[i:i + env.chunk_size]))
        for i in range(0, len(text), env.chunk_size)
    ]
    random.shuffle(chunks)
    return chunks


def effectful_embed_chunk(chunk: ChunkWithoutEmbedding) -> Chunk:
    if datetime.now() > datetime(2025, 1, 1):
        raise ValueError("Expired")
    h = hashlib.sha256(chunk.text.encode("utf-8")).hexdigest()
    step = 4
    vec = tuple(int(h[i:i + step], 16) / (16 ** step - 1) for i in range(0, 64, step))
    return Chunk(chunk.doc_id, chunk.text, chunk.start, chunk.end, vec)

Smells: Static global LOG (hidden logging), RNG (nondeterministic), time (flaky).

4. Refactor to Explicit: Pure Core + Shell in RAG¶

4.1 Pure Core¶

Pure logic; return values + artifacts (logs, etc.); no effects in core.

Full code:

# module-01/funcpipe-rag-01/src/funcpipe_rag/pipeline_stages.py (pure helpers)
from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from datetime import datetime
import random
import hashlib
from funcpipe_rag import structural_dedup_chunks


def clean_doc_pure(doc: RawDoc) -> tuple[CleanDoc, list[str]]:
    abstract = " ".join(doc.abstract.strip().lower().split())
    cleaned = CleanDoc(doc.doc_id, doc.title, abstract, doc.categories)
    return cleaned, [f"Cleaned doc {doc.doc_id}"]


def chunk_doc_pure(seed: int, doc: CleanDoc, env: RagEnv) -> tuple[ChunkWithoutEmbedding, ...]:
    # Use seed for deterministic shuffle if needed; here we demonstrate with shuffle
    text = doc.abstract
    chunks = [
        ChunkWithoutEmbedding(doc.doc_id, text[i:i + env.chunk_size], i, i + len(text[i:i + env.chunk_size]))
        for i in range(0, len(text), env.chunk_size)
    ]
    rng = random.Random(seed)
    rng.shuffle(chunks)
    return tuple(chunks)


def embed_chunk_pure(current_time: datetime, chunk: ChunkWithoutEmbedding) -> Chunk:
    if current_time > datetime(2025, 1, 1):
        raise ValueError(
            "Expired")  # We still throw here; in later modules we’ll model this as Result[Chunk, ExpiredError] instead.
    h = hashlib.sha256(chunk.text.encode("utf-8")).hexdigest()
    step = 4
    vec = tuple(int(h[i:i + step], 16) / (16 ** step - 1) for i in range(0, 64, step))
    return Chunk(chunk.doc_id, chunk.text, chunk.start, chunk.end, vec)


def full_rag_pure(seed: int, current_time: datetime, docs: list[RawDoc], env: RagEnv) -> tuple[
    tuple[Chunk, ...], list[str]]:
    cleaned_with_logs = [clean_doc_pure(doc) for doc in docs]
    cleaned = [cleaned for cleaned, _ in cleaned_with_logs]
    logs = [msg for _, messages in cleaned_with_logs for msg in messages]
    chunked = [chunk_doc_pure(seed, doc, env) for doc in cleaned]
    flattened = [chunk for doc_chunks in chunked for chunk in doc_chunks]
    embedded = [embed_chunk_pure(current_time, chunk) for chunk in flattened]
    # structural_dedup_chunks: pure helper that removes duplicate chunks; defined in Core 6
    deduped = structural_dedup_chunks(embedded)
    return tuple(deduped), logs

4.2 Impure Shell (Edge Only)¶

Handle effects; delegate to pure core.

Full code:

# module-01/funcpipe-rag-01/src/funcpipe_rag/rag_shell.py (context bundle)
from dataclasses import dataclass
from typing import Callable
from funcpipe_rag import full_rag_pure
from funcpipe_rag import RawDoc, Chunk, RagEnv
from datetime import datetime


@dataclass(frozen=True)
class LogEnv:
    log: Callable[[str], None]


@dataclass(frozen=True)
class TimeEnv:
    now: Callable[[], datetime]


@dataclass(frozen=True)
class RandEnv:
    seed: int


@dataclass(frozen=True)
class RagCoreEnv:
    log_env: LogEnv
    time_env: TimeEnv
    rand_env: RandEnv


def full_rag_shell(env: RagCoreEnv, docs: list[RawDoc], rag_env: RagEnv) -> tuple[Chunk, ...]:
    chunks, logs = full_rag_pure(env.rand_env.seed, env.time_env.now(), docs, rag_env)
    for message in logs:
        env.log_env.log(message)
    return chunks

module-01/funcpipe-rag-01/src/funcpipe_rag/rag_shell.py remains the only effectful entry point, reading CSV input and writing JSONL output while calling full_rag_shell (which delegates into full_rag_pure).

Wins: Static (no effects in core), deterministic when fixed, semantics aligned with Core 1-7.

4.3 Real-World Integration¶

Frameworks (e.g., Django/Flask) often force globals (request, timezone.now()). Adapt by constructing Env from framework context:

Full code:

# Flask example: Wrap request + timezone into Env
from flask import request, current_app
from datetime import datetime, timezone
from funcpipe_rag import full_rag_shell, RagCoreEnv, LogEnv, TimeEnv, RandEnv
from funcpipe_rag import RawDoc, RagEnv, Chunk
from funcpipe_rag import with_context


def rag_entry(env: RagCoreEnv, docs: list[RawDoc], rag_env: RagEnv) -> tuple[Chunk, ...]:
    return full_rag_shell(env, docs, rag_env)


def flask_handler() -> tuple[Chunk, ...]:
    env = RagCoreEnv(
        log_env=LogEnv(log=current_app.logger.info),
        time_env=TimeEnv(now=lambda: datetime.now(timezone.utc)),
        rand_env=RandEnv(seed=42)
    )
    body = request.json
    docs = [RawDoc(**d) for d in body["docs"]]

    # Freeze env so downstream call sites don't have to thread it through manually.
    full_rag = with_context(env, rag_entry)
    return full_rag(docs, RagEnv(chunk_size=512))

Wins: Framework globals → explicit Env; pure core stays isolated.

5. Equational Reasoning: Substitution Exercise¶

Hand Exercise: Replace expressions in full_rag_pure.
1. Inline clean_doc_pure(doc) → (CleanDoc, logs).
2. Substitute into chunk_doc_pure → tuple of chunks (seeded).
Bug Hunt: In effectful_clean_doc, substitution fails (hidden log/time/RNG).

6. Property-Based Testing: Proving Equivalence (Advanced, Optional)¶

Use Hypothesis to prove behavior.

You can safely skip this on a first read and still follow later cores—come back when you want to mechanically verify your own refactors.

For side-effect extraction, a couple of simple tests with a fake Env are usually enough; Hypothesis is nice-to-have, not mandatory.

To bridge theory and practice, here's a simple Hypothesis example illustrating impurity detection:

import random
from hypothesis import given
import hypothesis.strategies as st

def impure_random_add(x: int) -> int:
    return x + random.randint(1, 10)  # Non-deterministic

@given(st.integers())
def test_detect_impurity(x):
    assert impure_random_add(x) == impure_random_add(x)  # Falsifies due to randomness

# Hypothesis will quickly find differing outputs for the same x

This property test detects the impurity by showing outputs vary for identical inputs—run it to see Hypothesis in action.

6.1 Custom Strategy (RAG Domain)¶

From module-01/funcpipe-rag-01/tests/conftest.py (as in Core 1).

6.2 Equivalence Property¶

Properties for stages (using the helpers in module-01/funcpipe-rag-01/src/funcpipe_rag/rag_shell.py):

Full code:

# module-01/funcpipe-rag-01/tests/test_laws.py (excerpt)
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import clean_doc_pure, chunk_doc_pure, embed_chunk_pure, full_rag_pure
from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from funcpipe_rag import RagCoreEnv, LogEnv, TimeEnv, RandEnv, full_rag_shell
from .conftest import raw_doc_strategy, env_strategy, doc_list_strategy
from datetime import datetime

fixed_seed = 42
fixed_time = datetime(2024, 1, 1)


@given(raw_doc_strategy())
def test_clean_doc_pure_deterministic(doc: RawDoc) -> None:
    res1, logs1 = clean_doc_pure(doc)
    res2, logs2 = clean_doc_pure(doc)
    assert res1 == res2 and logs1 == logs2


@given(st.builds(CleanDoc, doc_id=st.text(min_size=1), title=st.text(), abstract=st.text(), categories=st.text()),
       env_strategy())
def test_chunk_doc_pure_deterministic(doc: CleanDoc, env: RagEnv) -> None:
    assert chunk_doc_pure(fixed_seed, doc, env) == chunk_doc_pure(fixed_seed, doc, env)


@given(st.builds(ChunkWithoutEmbedding, doc_id=st.text(min_size=1), text=st.text(min_size=1),
                 start=st.integers(min_value=0), end=st.integers(min_value=1)))
def test_embed_chunk_pure_deterministic(chunk: ChunkWithoutEmbedding) -> None:
    assert embed_chunk_pure(fixed_time, chunk) == embed_chunk_pure(fixed_time, chunk)


@given(doc_list_strategy(), env_strategy())
def test_full_rag_shell_matches_pure(docs: list[RawDoc], env: RagEnv) -> None:
    messages: list[str] = []
    env_bundle = RagCoreEnv(
        log_env=LogEnv(log=messages.append),
        time_env=TimeEnv(now=lambda: fixed_time),
        rand_env=RandEnv(seed=fixed_seed),
    )
    shell_chunks = full_rag_shell(env_bundle, docs, env)
    pure_chunks, logs = full_rag_pure(fixed_seed, fixed_time, docs, env)
    assert shell_chunks == pure_chunks
    assert messages == logs

Note: Properties enforce determinism, equivalence (up to order, with mocks), invariants.

6.3 Shrinking Demo: Catching a Bug¶

Bad refactor (hidden RNG in chunk):

from funcpipe_rag import CleanDoc, ChunkWithoutEmbedding, RagEnv
import random


def bad_chunk_doc(doc: CleanDoc, env: RagEnv) -> tuple[ChunkWithoutEmbedding, ...]:
    text = doc.abstract
    chunks = [
        ChunkWithoutEmbedding(doc.doc_id, text[i:i + env.chunk_size], i, i + len(text[i:i + env.chunk_size]))
        for i in range(0, len(text), env.chunk_size)
    ]
    random.shuffle(chunks)  # Hidden
    return tuple(chunks)

Property:

from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import CleanDoc, RagEnv
from .conftest import env_strategy


@given(st.builds(CleanDoc, doc_id=st.text(min_size=1), title=st.text(), abstract=st.text(min_size=1),
                 categories=st.text()), env_strategy())
def test_bad_chunk_doc_deterministic(doc: CleanDoc, env: RagEnv) -> None:
    assert bad_chunk_doc(doc, env) == bad_chunk_doc(doc, env)  # Falsifies due to randomness

Hypothesis failure trace (run to verify; example):

Falsifying example: test_bad_chunk_doc_deterministic(
    doc=CleanDoc(doc_id='a', title='', abstract='ab', categories=''), 
    env=RagEnv(chunk_size=1),
)
AssertionError

Shrinks to doc with multiple chunks; different shuffles fail equality. Catches bug via shrinking.

7. When Explicit Dependencies Aren't Worth It¶

Rarely, for trivial scripts or hot paths, use globals; rely on properties in tests.

8. Pre-Core Quiz¶

datetime.now() inside pure func → violates? → Explicit dependencies
Global logger → violates? → No hidden prints
Same inputs+fixed env → same output? → Determinism
Direct DB call → fix with? → env.db
Tool to prove fixed-env determinism? → Hypothesis

9. Post-Core Reflection & Exercise¶

Reflect: In your code, find one function touching globals/env/time/random/print. Bundle into frozen Env; pull pure core; write shell; inject with with_context; add Hypothesis.
Project Exercise: Isolate effects in RAG; run properties on sample data.

All claims (e.g., referential transparency) are verifiable via the provided Hypothesis examples—run them to confirm.

Further Reading: For more on purity pitfalls, see 'Fluent Python' Chapter on Functions as Objects. If the Python basics still feel shaky, check free resources like Python.org's FP section or Codecademy's Advanced Python course.

Continue with: Equational Reasoning