Pure Functions & Contracts¶
Page Maps¶
graph LR
family["Python Programming"]
program["Python Functional Programming"]
section["Purity Substitution Local Reasoning"]
page["Pure Functions & Contracts"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
This lesson turns "keep functions pure" into something reviewable.
Start With the Failure Mode¶
A function usually violates its contract in one of four ways:
- it reads hidden input such as environment, time, or module state
- it returns different results for the same visible inputs
- it mutates caller-owned data or shared state
- it performs effects in the middle of logic that should stay substitutable
The goal of this page is to make those failures easy to name and easy to detect.
Keep This Question In View¶
How do you constrain functions so that purity violations are detectable early by signatures, properties, and optional runtime checks?
By the end of this lesson, you should be able to point at one function and say:
- which purity clause matters here
- how to check it
- what refactor would make the clause hold again
1. Conceptual Foundation¶
1.1 The One-Sentence Rule¶
Make every input explicit, every output deterministic, and never mutate shared state—or mark the function impure and isolate it.
1.2 Pure Function Contract in One Precise Sentence¶
A pure function contract requires: all inputs explicit parameters, outputs deterministic, no shared state mutated, no side effects (except raising exceptions based on inputs)—detectable with high confidence by property tests and, to a lesser extent, type hints and runtime checks.
1.3 Why This Matters Now¶
Purity is only useful when another engineer can verify it. A contract gives that engineer a checklist instead of a vibe:
- Are all inputs visible in the signature?
- Can the same inputs produce a different result?
- Does this call mutate anything outside its own local scope?
- Is there an effect here that belongs in a thin shell instead?
1.4 Contracts: Three Layers¶
Contracts go beyond purity to include preconditions, postconditions, and invariants. In this module, use three layers:
| Layer | Description | Examples | When to Use |
|---|---|---|---|
| Static (typing) | Shape and domain of data | Type hints like list[RawDoc] |
For structure/shape; encourages explicitness |
| Dynamic (asserts) | Runtime checks that raise errors | assert amount >= 0 |
For simple, enforceable conditions |
| Behavioral (Hypothesis) | Relational properties over inputs/outputs | f(g(x)) == g(f(x)) |
For subtle behaviors like determinism, no mutation, invariants |
Use the lightest tool that can honestly express the rule:
- Use types for data shape and obvious domain boundaries.
- Use runtime checks for simple conditions that should fail loudly.
- Use property tests when the rule is relational, behavioral, or easy to accidentally break.
2. Mental Model: Contract Violations vs Detection¶
2.1 One Picture¶
Leaky Contract Reviewed Contract
+---------------------------+ +-----------------------------+
| hidden globals / time | | explicit parameters |
| hidden RNG / OS env vars | | deterministic output |
| mutates caller data | | returns new values |
| prints / logs / I/O | | effect boundary named |
+---------------------------+ +-----------------------------+
Types and checks narrow the space. Properties catch the subtle leaks that still get
through.
2.2 Contract Table¶
| Clause | Violation Example | Detected By |
|---|---|---|
| Explicit inputs | Globals, os.getenv, datetime.now() |
Code review + discipline; types encourage explicitness |
| Deterministic outputs | random, time, external state |
Hypothesis determinism property |
| No shared mutation | list.sort(), dict.update() |
Hypothesis mutation check + deepcopy |
| No side effects | print, logging, I/O |
Manual review |
Note on Shared Mutation: Includes nested structures and aliases; use deepcopy in
properties when you need to prove caller-owned values were not changed. Unseeded RNG
breaks determinism. Seeded RNG is only acceptable when the seed is explicit input.
In the rest of this core we turn those columns into concrete contracts on the RAG stages and one non-RAG function.
3. Running Project: Contracts on RAG Stages¶
Our running project (from module-01/funcpipe-rag-01/README.md) adds contracts to Core 1's pure stages.
- Goal: Ensure each stage is pure and detectable.
- Start: Core 1's pure functions.
- End (this core): Functions with Hypothesis properties for determinism, no mutation, and invariants. Semantics aligned with Core 1.
3.1 Types (Canonical)¶
These are defined in module-01/funcpipe-rag-01/src/funcpipe_rag/rag_types.py (as in Core 1) and imported as needed. No redefinition here.
3.2 Impure Variants (Anti-Patterns in RAG)¶
Full code:
# Impure clean (hidden global)
from funcpipe_rag import RawDoc, CleanDoc
DEBUG = True
def impure_clean_doc(doc: RawDoc) -> CleanDoc:
abstract = " ".join(doc.abstract.strip().lower().split())
if DEBUG:
abstract += " (debug)"
return CleanDoc(doc.doc_id, doc.title, abstract, doc.categories)
# Impure chunk (nondeterministic)
import random
from typing import List
from funcpipe_rag import CleanDoc, ChunkWithoutEmbedding, RagEnv
def impure_chunk_doc(doc: CleanDoc, env: RagEnv) -> List[ChunkWithoutEmbedding]:
text = doc.abstract
offset = random.randint(0, min(10, max(0, len(text) - 1))) # Hidden RNG; violates determinism
return [
ChunkWithoutEmbedding(doc.doc_id, text[i + offset:i + offset + env.chunk_size], i + offset,
i + offset + len(text[i + offset:i + offset + env.chunk_size]))
for i in range(0, len(text), env.chunk_size)
] # Arbitrary offset for demo; not production chunking logic
# Impure embed (mutates input)
import hashlib
from dataclasses import dataclass
from funcpipe_rag import Chunk
@dataclass # Mutable for demo
class MutableChunkWithoutEmbedding:
doc_id: str
text: str
start: int
end: int
def impure_embed_chunk(chunk: MutableChunkWithoutEmbedding) -> Chunk:
# Mutates text (shared mutation)
chunk.text = chunk.text.upper()
h = hashlib.sha256(chunk.text.encode("utf-8")).hexdigest()
step = 4
vec = tuple(int(h[i:i + step], 16) / (16 ** step - 1) for i in range(0, 64, step))
return Chunk(chunk.doc_id, chunk.text, chunk.start, chunk.end, vec)
Smells: Hidden global (DEBUG), nondeterministic RNG (violates determinism as offset varies), shared mutation (chunk.text). Note: Used mutable class for mutation demo; canonical types remain frozen.
4. Correct Pattern: Detectable Contracts in RAG¶
4.1 Pure Core¶
The canonical implementations of clean_doc, chunk_doc, embed_chunk, and structural_dedup_chunks live in module-01/funcpipe-rag-01/src/funcpipe_rag/pipeline_stages.py (identical to Core 1). Each function accepts only explicit parameters, returns brand-new dataclasses, and never mutates inputs—exactly the contract we want to enforce in this core.
4.2 Non-RAG Example¶
Use your own domain code (e.g., accounting transfers) to practice adding explicit inputs, determinism, and invariants—the same discipline applied to the FuncPipe stages. Keep side effects isolated just as we do with rag_shell.
4.3 Impure Shell (Edge Only)¶
The shell from Core 1 remains; contracts focus on pure core.
5. Equational Reasoning: Substitution Exercise¶
Hand Exercise: Replace expressions in clean_doc.
1. Inline abstract = " ".join(...) → normalized string.
2. Substitute into CleanDoc → fixed value.
Bug Hunt: In impure_clean_doc, the value of CleanDoc(..., abstract, ...) depends on the hidden global DEBUG, so you can’t safely replace the call with its value without also knowing that global state.
6. Property-Based Testing: Proving Equivalence (Advanced, Optional)¶
Use Hypothesis to prove contract compliance.
You can safely skip this on a first read and still follow later cores—come back when you want to mechanically verify your own refactors.
We saw the simplest determinism property in Core 1; now we’ll apply the same idea to the RAG stages.
To enhance the explanation, here's a short demo of falsifying an impure function using Hypothesis:
import random
from hypothesis import given
import hypothesis.strategies as st
def impure_random_add(x: int) -> int:
return x + random.randint(1, 10) # Non-deterministic
@given(st.integers())
def test_detect_impurity(x):
assert impure_random_add(x) == impure_random_add(x) # Falsifies due to randomness
# Hypothesis will quickly find differing outputs for the same x
This property test detects the impurity by showing outputs vary for identical inputs—run it to see Hypothesis in action.
6.1 Custom Strategy (RAG Domain)¶
From module-01/funcpipe-rag-01/tests/conftest.py (as in Core 1).
6.2 Contract Properties for RAG Stages¶
Full code:
# module-01/funcpipe-rag-01/tests/test_laws.py (excerpt)
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import clean_doc, chunk_doc, embed_chunk
from funcpipe_rag import ChunkWithoutEmbedding
from .conftest import raw_doc_strategy, env_strategy
@given(doc=raw_doc_strategy())
def test_clean_doc_deterministic(doc):
assert clean_doc(doc) == clean_doc(doc)
@given(doc=raw_doc_strategy())
def test_clean_doc_invariants(doc):
cleaned = clean_doc(doc)
# no double spaces
assert " " not in cleaned.abstract
# normalized whitespace and case
assert cleaned.abstract == " ".join(doc.abstract.strip().lower().split())
@given(doc=raw_doc_strategy(), env=env_strategy())
def test_chunk_doc_deterministic(doc, env):
cleaned = clean_doc(doc)
assert chunk_doc(cleaned, env) == chunk_doc(cleaned, env)
@given(doc=raw_doc_strategy(), env=env_strategy())
def test_chunk_doc_covers_cleaned(doc, env):
cleaned = clean_doc(doc)
assert "".join(c.text for c in chunk_doc(cleaned, env)) == cleaned.abstract
chunk_we_strategy = st.builds(
ChunkWithoutEmbedding,
doc_id=st.text(),
text=st.text(min_size=1),
start=st.integers(min_value=0, max_value=1000),
).map(
lambda c: ChunkWithoutEmbedding(
c.doc_id, c.text, c.start, c.start + len(c.text)
)
)
@given(chunk_we=chunk_we_strategy)
def test_embed_deterministic(chunk_we):
assert embed_chunk(chunk_we) == embed_chunk(chunk_we)
@given(chunk_we=chunk_we_strategy)
def test_embed_range_and_dimension(chunk_we):
emb = embed_chunk(chunk_we).embedding
assert len(emb) == 16
assert all(0.0 <= x <= 1.0 for x in emb)
import copy
@given(chunk_we=chunk_we_strategy)
def test_embed_does_not_mutate_input(chunk_we):
original = copy.deepcopy(chunk_we)
_ = embed_chunk(chunk_we)
assert chunk_we == original
These real-world properties encode determinism, invariants, no mutation, and pure behavior directly in the repository.
6.3 Shrinking Demo: Catching a Bug¶
Bad refactor (off-by-one in chunk_doc end, dropping last char):
Full code:
from typing import List
from funcpipe_rag import CleanDoc, ChunkWithoutEmbedding, RagEnv
def bad_chunk_doc(doc: CleanDoc, env: RagEnv) -> List[ChunkWithoutEmbedding]:
text = doc.abstract
return [
ChunkWithoutEmbedding(doc.doc_id, text[i:i + env.chunk_size], i, i + len(text[i:i + env.chunk_size]) - 1)
# Off-by-one
for i in range(0, len(text), env.chunk_size)
]
Property:
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import CleanDoc, RagEnv
from .conftest import env_strategy
@given(
doc=st.builds(CleanDoc, doc_id=st.text(min_size=1), title=st.text(), abstract=st.text(min_size=1),
categories=st.text()),
env=env_strategy(),
)
def test_bad_chunk_doc_index_invariant(doc: CleanDoc, env: RagEnv) -> None:
chunks = bad_chunk_doc(doc, env)
for c in chunks:
assert c.end - c.start == len(c.text) # Fails on off-by-one
@given(
doc=st.builds(CleanDoc, doc_id=st.text(min_size=1), title=st.text(), abstract=st.text(min_size=1),
categories=st.text()),
env=env_strategy(),
)
def test_bad_chunk_doc_covers_abstract(doc: CleanDoc, env: RagEnv) -> None:
chunks = bad_chunk_doc(doc, env)
reconstructed = "".join(c.text for c in chunks)
assert reconstructed == doc.abstract # Fails on dropped chars
These two properties encode the “index” and “coverage” invariants we care about for chunks.
Hypothesis failure trace (run to verify; example for index_invariant):
Falsifying example: test_bad_chunk_doc_index_invariant(
doc=CleanDoc(doc_id='a', title='', abstract='a', categories=''),
env=RagEnv(chunk_size=128),
)
AssertionError
- Shrinks to minimal doc with abstract='a'; off-by-one makes end - start != len(text). For coverage, shrinks to doc with abstract length not multiple of chunk_size, dropping tail. Catches subtle bug via shrinking.
7. When Contracts Aren't Worth It¶
For ultra-hot paths, skip runtime checks; rely on properties in tests.
8. Pre-Core Quiz¶
- Hidden global in a function → violates which clause? → Explicit inputs
random.random()inside a function → violates? → Deterministic outputsdata.sort()on a param → violates? → No shared mutationprint()inside a function → violates? → No side effects- Tool that searches for counterexamples to determinism? → Hypothesis
9. Post-Core Reflection & Exercise¶
Reflect: In your code, find one function that violates a contract. Apply the recipe; add properties.
Project Exercise: Implement contracts on RAG stages; run properties on sample data.
All claims (e.g., referential transparency) are verifiable via the provided Hypothesis examples—run them to confirm.
Further Reading: For more on purity pitfalls, see 'Fluent Python' Chapter on Functions as Objects. Check free resources like Python.org's FP section or Codecademy's Advanced Python course for basics.
Continue with: Immutability & Value Semantics