FP-Friendly APIs¶
Page Maps¶
graph LR
family["Python Programming"]
program["Python Functional Programming"]
section["Data First Apis Expression Style"]
page["FP-Friendly APIs"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
This lesson is about making a function boundary easy to inspect. Good API shape is not decoration around the real code. It determines whether later composition, testing, and refactoring stay manageable.
Start With the Review Pressure¶
Large signatures and hidden dependencies usually appear one "small" requirement at a time. This page shows how to stop that drift early.
- If a function needs many unrelated arguments, the model of the operation is still blurry.
- If callers must know about globals, environment variables, or service singletons, the API is lying about what it needs.
- If reviewers cannot tell which inputs are data, which are policy, and which are external services, composition will stay fragile.
Keep This Question In View¶
Core question:
How do you design APIs with ≤3 parameters, explicit config and dependencies, and no hidden globals—so pipelines from M02C01–M02C03 are composable, testable, and predictable?
This lesson introduces FP-friendly API design as a set of reviewable design choices:
- keep the public function boundary small enough to understand in one read
- separate input data from configuration and from injected services
- preserve the earlier module gains so configured, expression-oriented, and lazy stages still compose cleanly
The running project grounds the rule in a realistic case: a RAG pipeline is not simple, but its boundary still has to stay inspectable.
Use this when you have lazy pipelines but face high-arity functions or hidden globals that make testing and composition awkward.
Outcome:
1. Identify high-arity or hidden dependencies in code and explain their impact on composability.
2. Refactor a high-arity function into a small-arity API with grouped config and dependencies.
3. Write Hypothesis properties proving equivalence and idempotence, with a shrinking example.
1. Conceptual Foundation¶
1.1 FP-Friendly API Design in One Precise Sentence¶
FP-friendly APIs limit core public functions to ≤3 parameters, group domain settings into immutable config and services into explicit dependencies, and avoid hidden globals—ensuring composability, testability, and equational reasoning.
1.2 The One-Sentence Rule¶
Core public APIs must have ≤3 parameters (inputs, config, deps), with all dependencies explicit and globals forbidden; bind config/deps at edges using M02C01 partials or factories.
1.3 Why This Matters Now¶
The earlier lessons made it possible to write configured, expression-oriented, lazy code. This lesson makes it possible to keep that code usable as a public surface. Without a disciplined API boundary, the implementation improvements remain trapped behind a function signature that keeps leaking internal details.
1.4 FP-Friendly APIs as Values in 5 Lines¶
The example below matters because it shows how configured callables become easy to store and reuse once the boundary shape is stable.
from functools import partial
from funcpipe_rag import RagConfig, RagEnv, RulesConfig, StartsWith, get_deps, iter_rag_core
standard_config = RagConfig(env=RagEnv(512))
standard_deps = get_deps(standard_config)
filtered_config = RagConfig(
env=RagEnv(512),
keep=RulesConfig(keep_pred=StartsWith("categories", "cs.")),
)
filtered_deps = get_deps(filtered_config)
rags: dict[str, object] = {
"standard": partial(iter_rag_core, config=standard_config, deps=standard_deps),
"filtered": partial(iter_rag_core, config=filtered_config, deps=filtered_deps),
}
Small-arity functions (inputs, config, deps), explicit config/deps, and no globals allow storage in dicts, composition with M02C01 partials, and testing as first-class values. For example, swapping keep in config creates variants without globals or high arity.
Note: In real systems, embed may involve I/O (e.g., API calls); injecting it in deps ensures stricter purity, treating the core as referentially transparent.
2. Mental Model: High-Arity Globals vs Small Explicit APIs¶
2.1 One Picture¶
High-Arity Globals (Messy) Small Explicit APIs (Composable)
+-----------------------+ +------------------------------+
| def rag(docs, env, | | def iter_rag_core(docs, |
| cleaner, keep, taps, | | config, deps) |
| chunk_size, more...) | | -> Iterator[Chunk] |
| # Uses GLOBAL_CFG | | # Config: env, keep |
| | | # Deps: cleaner, embed, taps |
+-----------------------+ +------------------------------+
↑ Hard to Test/Compose ↑ Snaps into Partial/Flow
2.2 Contract Table¶
| Aspect | High-Arity Globals | Small Explicit APIs |
|---|---|---|
| Arity | >3 params | ≤3 (inputs, config, deps) |
| Dependencies | Hidden globals/env vars | Explicit config/deps structs |
| Composability | Hard (many args, globals) | Easy (partial, flow) |
| Testing | Mock globals, flaky | Inject fakes, deterministic |
| Boundaries | Mixed pure/effects | Pure core, effectful edges |
| Reasoning | Opaque (hidden state) | Equational (substitutable) |
| Mutable Defaults in Partials | Breaks Determinism | Use frozen dataclasses or immutable types for configs |
Note on High-Arity Choice: Use higher arity only for legacy adapters, wrapping small-arity cores.
2.3 Common API Shapes Table¶
To lock in the arity rule, here are typical shapes:
| Shape | Meaning | Example |
|---|---|---|
| f(data) | Pure utility, no config/deps | hash(data) |
| f(data, config) | Domain-level core | chunk(data, ChunkConfig) |
| f(data, config, deps) | Cross-cutting deps present | iter_rag_core(docs, config, deps) |
Any other shape (e.g., f(docs, env, keep, cleaner, taps, ...)) must be considered an anti-pattern and refactored.
3. Running Project: FuncPipe RAG Builder¶
We extend the FuncPipe RAG Builder from m02-rag.md:
- Dataset: 10k arXiv CS abstracts (arxiv_cs_abstracts_10k.csv).
- Goal: Design small-arity, explicit APIs that compose lazily and match the baseline outputs.
- Start: High-arity, global-dependent version (core4_start.py).
- End: FP-friendly API with streaming core and edge materialization.
3.1 Types (Canonical, Used Throughout)¶
From capstone/src/funcpipe_rag/rag_types.py, capstone/src/funcpipe_rag/api/types.py, plus new config/deps:
3.2 High-Arity Start (Anti-Pattern)¶
# core4_start.py: High-arity, global-dependent RAG
from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from funcpipe_rag import DocRule, Observations, RagTaps
from funcpipe_rag import any_doc
from funcpipe_rag import clean_doc, embed_chunk, structural_dedup_chunks, gen_chunk_doc
from collections.abc import Callable, Sequence
GLOBAL_ENV = RagEnv(512) # Hidden global
def high_arity_rag(
docs: list[RawDoc],
cleaner: Callable[[RawDoc], CleanDoc],
keep: DocRule | None,
taps: RagTaps | None,
chunk_size: int = GLOBAL_ENV.chunk_size,
debug: bool = False
) -> tuple[list[Chunk], Observations]:
rule = keep if keep is not None else any_doc
kept_docs = [d for d in docs if rule(d)]
if taps and taps.docs and debug:
taps.docs(tuple(kept_docs))
cleaned = [cleaner(d) for d in kept_docs]
if taps and taps.cleaned:
taps.cleaned(tuple(cleaned))
chunk_we = [c for cd in cleaned for c in gen_chunk_doc(cd, RagEnv(chunk_size))]
embedded = [embed_chunk(c) for c in chunk_we]
chunks = structural_dedup_chunks(embedded)
if taps and taps.chunks and debug:
taps.chunks(tuple(chunks))
obs = Observations(
total_docs=len(docs),
total_chunks=len(chunks),
kept_docs=len(kept_docs),
cleaned_docs=len(cleaned),
sample_doc_ids=tuple(d.doc_id for d in kept_docs[:5]),
sample_chunk_starts=tuple(c.start for c in chunks[:5]),
)
return chunks, obs
Smells:
- High arity (6 params: docs, cleaner, keep, taps, chunk_size, debug).
- Hidden global (GLOBAL_ENV).
- Mixed effects (taps with debug flag).
Problem: Hard to partialize, test, or reason about due to excessive params and global dependency.
4. Refactor to FP-Friendly: Small Arity, Explicit Dependencies¶
To strengthen pedagogy, here's a concrete before/after example for redesigning an unfriendly API:
import os
import pandas as pd
from dataclasses import dataclass
# Before: Unfriendly API with implicit context
def foo(df: pd.DataFrame) -> pd.DataFrame:
threshold = float(os.environ.get('THRESHOLD', '0.5')) # Hidden env dep
return df[df['value'] > threshold] # Non-deterministic if env changes
@dataclass(frozen=True)
class FooConfig:
threshold: float
# After: FP-Friendly with explicit deps
def foo(data: pd.DataFrame, *, config: FooConfig) -> pd.DataFrame:
return data[data['value'] > config.threshold] # Pure: Depends only on inputs
This makes the function testable (inject mock config) and composable—no surprises from environment variables.
4.1 Streaming Core (Pure, Lazy)¶
A pure, lazy core with small arity, building on M02C03:
from funcpipe_rag import RagConfig, RagCoreDeps, iter_rag_core
chunks_iter = iter_rag_core(docs, config, deps)
Properties:
- Arity 3: docs, config, deps.
- Pure, fully lazy (generator-based, O(1) memory).
- No taps (effects deferred to edge).
- Explicit config/deps, no globals.
4.2 Post-Clean Streaming Sub-Core¶
To reuse core logic at the edge without duplicating the full pipeline:
from collections.abc import Iterator, Iterable, Callable
from funcpipe_rag import CleanDoc, Chunk, ChunkWithoutEmbedding, RagConfig
from funcpipe_rag import gen_chunk_doc
def iter_chunks_from_cleaned(
cleaned: Iterable[CleanDoc],
config: RagConfig,
embed: Callable[[ChunkWithoutEmbedding], Chunk]
) -> Iterator[Chunk]:
"""Sub-core: lazy chunk and embed from cleaned docs (reuses M02C03 patterns)."""
for cd in cleaned:
for chunk in gen_chunk_doc(cd, config.env):
yield embed(chunk)
Properties:
- Arity 3: cleaned, config, embed (sub-core, internal; embed injected for consistency).
Here config is domain config and embed is a dependency; we still respect the “data, config, deps” ≤3-arity pattern even in internal sub-cores.
- Enables reuse in full_rag_api_docs (and full_rag_api) for lazy post-clean processing.
4.3 Public API (Edge, Materializes)¶
Wraps the core components, handles materialization and taps:
from funcpipe_rag import RagConfig, RagCoreDeps, full_rag_api_docs
# Canonical end-of-Module-02 API
# See `capstone/src/funcpipe_rag/rag/rag_api.py` for the public API shape and
# `capstone/src/funcpipe_rag/rag/config.py` for the frozen config and dependency wiring.
chunks, obs = full_rag_api_docs(docs, config, deps)
Properties:
- Arity 3, explicit config/deps.
- Builds on M02C03 laziness internally (lazy post-clean via sub-core); materializes filter/clean at edge for taps/obs.
- Reuses core expressions via a private _tap helper and the iter_chunks_from_cleaned sub-core; taps are observational side effects isolated to the edge.
- Matches the baseline stage composition when config.keep = DEFAULT_RULES, deps.taps = None.
Note: iter_rag_core is the fully streaming core. full_rag_api_docs intentionally materializes intermediates for observations/taps; laziness applies post-clean. Dedup runs post-tap as it requires a global view. _tap is an internal helper in capstone/src/funcpipe_rag/rag/rag_api.py, not a public API.
4.4 Configurator Tie-In (M02C01) and Swapping Examples¶
from functools import partial
from funcpipe_rag import (
Chunk,
ChunkWithoutEmbedding,
RagConfig,
RagCoreDeps,
RagEnv,
RulesConfig,
StartsWith,
full_rag_api_docs,
get_deps,
)
# Standard variant
standard_config = RagConfig(env=RagEnv(512))
standard_deps = get_deps(standard_config)
rag_fn = partial(full_rag_api_docs, config=standard_config, deps=standard_deps)
# Swapping config: Filter to CS docs
cs_config = RagConfig(
env=RagEnv(512),
keep=RulesConfig(keep_pred=StartsWith("categories", "cs.")),
)
cs_deps = get_deps(cs_config)
cs_rag_fn = partial(full_rag_api_docs, config=cs_config, deps=cs_deps)
# Swapping deps: Fake embedder for tests (no I/O)
def fake_embed(c: ChunkWithoutEmbedding) -> Chunk:
return Chunk(c.doc_id, c.text, c.start, c.end, (0.0,) * 16) # Mock embedding
test_deps = RagCoreDeps(cleaner=standard_deps.cleaner, embedder=fake_embed, taps=None)
test_rag_fn = partial(full_rag_api_docs, config=standard_config, deps=test_deps)
Wins: Small arity enables easy partialization; config/deps allow clean swapping (e.g., rules via config, fakes via deps). Composes with M02C01 make_rag_fn.
5. Equational Reasoning: Substitution Exercise¶
Hand Exercise: Substitute expressions in iter_rag_core.
1. Inline rule = config.keep → fixed predicate.
2. Substitute into generator expression → filtered stream.
3. Result: Output stream is fixed for fixed docs, config, deps.
Bug Hunt: In high_arity_rag, GLOBAL_ENV breaks substitution (replacing reference changes behavior).
Example:
- High-arity: chunk_size = GLOBAL_ENV.chunk_size → depends on mutable global, substitution fails.
- Friendly: config.env.chunk_size → immutable, substitutable, behavior preserved.
6. Property-Based Testing: Proving Equivalence and Idempotence¶
Use Hypothesis to prove the refactor preserves baseline behavior and avoids global bugs.
6.1 Custom Strategy¶
From capstone/tests/conftest.py.
6.2 Equivalence Property (Core vs Baseline)¶
# capstone/tests/test_rag_api.py
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import (
clean_doc,
embed_chunk,
get_deps,
iter_chunk_doc,
RagConfig,
structural_dedup_chunks,
full_rag_api_docs,
iter_rag_core,
)
from tests.conftest import doc_list_strategy, env_strategy
from itertools import islice, tee
def baseline_full_rag(docs, env):
embedded = [embed_chunk(c) for d in docs for c in iter_chunk_doc(clean_doc(d), env)]
return structural_dedup_chunks(embedded)
@given(docs=doc_list_strategy(), env=env_strategy())
def test_rag_equivalence(docs, env):
config = RagConfig(env=env)
deps = get_deps(config)
docs1, docs2 = tee(iter(docs)) # Consistent iterables
baseline = baseline_full_rag(list(docs1), env)
chunks, _ = full_rag_api_docs(docs2, config, deps)
assert chunks == baseline
Note: Tests chunk equivalence to a baseline built from the pure stages.
6.3 Prefix Equivalence (Streaming Core)¶
@given(docs=doc_list_strategy(), env=env_strategy(), k=st.integers(0, 50))
def test_core_prefix_equivalence(docs, env, k):
config = RagConfig(env=env)
deps = get_deps(config)
docs1, docs2 = tee(iter(docs))
baseline = baseline_full_rag(list(docs1), env)
core_iter = iter_rag_core(docs2, config, deps)
assert list(islice(core_iter, k)) == baseline[:k]
Note: Verifies streaming core matches the baseline on finite prefixes (M02C03 tie-in).
6.4 Idempotence Property¶
@given(docs=doc_list_strategy(), env=env_strategy())
def test_rag_idempotence(docs, env):
config = RagConfig(env=env)
deps = get_deps(config)
docs1, docs2 = tee(iter(docs))
chunks1, _ = full_rag_api_docs(docs1, config, deps)
chunks2, _ = full_rag_api_docs(docs2, config, deps)
assert chunks1 == chunks2
Note: Verifies same inputs yield same outputs, catching global mutation bugs.
6.5 Shrinking Demo: Catching a Global Bug¶
Bad refactor with global mutation:
from funcpipe_rag import RawDoc, CleanDoc, Chunk, ChunkWithoutEmbedding, RagEnv
from funcpipe_rag import Observations, RagCoreDeps, eval_pred, full_rag_api_docs
from funcpipe_rag import gen_chunk_doc, structural_dedup_chunks
from collections.abc import Iterator, Iterable, Callable
def bad_full_rag_api(
docs: Iterable[RawDoc],
config: RagConfig,
deps: RagCoreDeps
) -> tuple[list[Chunk], Observations]:
# Reuse the same GLOBAL_ENV from the high_arity_rag anti-pattern
global GLOBAL_ENV
GLOBAL_ENV = RagEnv(config.env.chunk_size + 1) # Mutates global
docs_list = list(docs)
kept_docs = [d for d in docs_list if eval_pred(d, config.keep.keep_pred)]
cleaned = [deps.cleaner(d) for d in kept_docs]
chunks_iter = (deps.embedder(c) for cd in cleaned for c in gen_chunk_doc(cd, GLOBAL_ENV))
chunks = list(chunks_iter)
chunks = structural_dedup_chunks(chunks)
obs = Observations(total_docs=len(docs_list), total_chunks=len(chunks), kept_docs=len(kept_docs), cleaned_docs=len(cleaned))
return chunks, obs
Property testing the bad version:
@given(docs=doc_list_strategy(), env=env_strategy())
def test_bad_rag_idempotence(docs, env):
global GLOBAL_ENV
GLOBAL_ENV = env
config = RagConfig(env=env)
deps = RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk, taps=None)
docs1, docs2 = tee(iter(docs))
chunks1, _ = bad_full_rag_api(docs1, config, deps)
chunks2, _ = bad_full_rag_api(docs2, config, deps)
assert chunks1 == chunks2
Failure Trace (Example):
Falsifying example: test_bad_rag_idempotence(
docs=[RawDoc(doc_id='a', title='t', abstract='abc', categories='c')],
env=RagEnv(chunk_size=128),
)
AssertionError
Analysis: Shrinks to minimal doc where GLOBAL_ENV mutation changes chunk sizes between calls, breaking idempotence.
7. When FP-Friendly APIs Aren't Worth It¶
Use higher arity or globals only in:
- Legacy adapters (e.g., framework callbacks requiring fixed signatures).
- One-off scripts with no reuse.
Guardrails: Wrap such functions in thin adapters calling small-arity cores to isolate complexity.
Example:
# Legacy adapter
def legacy_rag(docs, chunk_size, cleaner, keep, debug):
config = RagConfig(env=RagEnv(chunk_size), keep=keep)
deps = RagCoreDeps(cleaner=cleaner, embedder=embed_chunk)
return full_rag_api_docs(docs, config, deps)
8. Pre-Core Quiz¶
- Why does
def f(a, b, c, d, e)violate FP-friendly design?
Answer: Arity >3, hard to partialize or compose. - How to fix a function using
GLOBAL_DB?
Answer: Inject as dependency indeps. - What’s wrong with
def rag(docs, cleaner, env, keep, taps)?
Answer: High arity (5); groupenv, keepintoconfig,cleaner, tapsintodeps. - Why use
RagConfigandRagCoreDepsstructs?
Answer: Encapsulate domain settings and services, reduce arity, clarify intent. - Tool to prove refactor correctness?
Answer: Hypothesis (equivalence, idempotence).
9. Post-Core Reflection & Exercise¶
Reflect: Find a function in your codebase with >3 params or hidden globals. Refactor it to use inputs, config, deps with arity ≤3. Add Hypothesis tests for equivalence and idempotence.
Project Exercise: Apply to RAG pipeline; run properties on arxiv_cs_abstracts_10k.csv.
- Did composability improve (easier partials)?
- Did tests catch global bugs?
- Did config/deps clarify domain logic?
Continue with: Effect Boundaries
Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.
Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.