Skip to content

FP-Friendly APIs

Page Maps

graph LR
  family["Python Programming"]
  program["Python Functional Programming"]
  section["Data First Apis Expression Style"]
  page["FP-Friendly APIs"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone
flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

This lesson is about making a function boundary easy to inspect. Good API shape is not decoration around the real code. It determines whether later composition, testing, and refactoring stay manageable.

Start With the Review Pressure

Large signatures and hidden dependencies usually appear one "small" requirement at a time. This page shows how to stop that drift early.

  • If a function needs many unrelated arguments, the model of the operation is still blurry.
  • If callers must know about globals, environment variables, or service singletons, the API is lying about what it needs.
  • If reviewers cannot tell which inputs are data, which are policy, and which are external services, composition will stay fragile.

Keep This Question In View

Core question:
How do you design APIs with ≤3 parameters, explicit config and dependencies, and no hidden globals—so pipelines from M02C01–M02C03 are composable, testable, and predictable?

This lesson introduces FP-friendly API design as a set of reviewable design choices:

  • keep the public function boundary small enough to understand in one read
  • separate input data from configuration and from injected services
  • preserve the earlier module gains so configured, expression-oriented, and lazy stages still compose cleanly

The running project grounds the rule in a realistic case: a RAG pipeline is not simple, but its boundary still has to stay inspectable.

Use this when you have lazy pipelines but face high-arity functions or hidden globals that make testing and composition awkward. Outcome:
1. Identify high-arity or hidden dependencies in code and explain their impact on composability.
2. Refactor a high-arity function into a small-arity API with grouped config and dependencies.
3. Write Hypothesis properties proving equivalence and idempotence, with a shrinking example.


1. Conceptual Foundation

1.1 FP-Friendly API Design in One Precise Sentence

FP-friendly APIs limit core public functions to ≤3 parameters, group domain settings into immutable config and services into explicit dependencies, and avoid hidden globals—ensuring composability, testability, and equational reasoning.

1.2 The One-Sentence Rule

Core public APIs must have ≤3 parameters (inputs, config, deps), with all dependencies explicit and globals forbidden; bind config/deps at edges using M02C01 partials or factories.

1.3 Why This Matters Now

The earlier lessons made it possible to write configured, expression-oriented, lazy code. This lesson makes it possible to keep that code usable as a public surface. Without a disciplined API boundary, the implementation improvements remain trapped behind a function signature that keeps leaking internal details.

1.4 FP-Friendly APIs as Values in 5 Lines

The example below matters because it shows how configured callables become easy to store and reuse once the boundary shape is stable.

from functools import partial
from funcpipe_rag import RagConfig, RagEnv, RulesConfig, StartsWith, get_deps, iter_rag_core

standard_config = RagConfig(env=RagEnv(512))
standard_deps = get_deps(standard_config)

filtered_config = RagConfig(
    env=RagEnv(512),
    keep=RulesConfig(keep_pred=StartsWith("categories", "cs.")),
)
filtered_deps = get_deps(filtered_config)

rags: dict[str, object] = {
    "standard": partial(iter_rag_core, config=standard_config, deps=standard_deps),
    "filtered": partial(iter_rag_core, config=filtered_config, deps=filtered_deps),
}

Small-arity functions (inputs, config, deps), explicit config/deps, and no globals allow storage in dicts, composition with M02C01 partials, and testing as first-class values. For example, swapping keep in config creates variants without globals or high arity.

Note: In real systems, embed may involve I/O (e.g., API calls); injecting it in deps ensures stricter purity, treating the core as referentially transparent.


2. Mental Model: High-Arity Globals vs Small Explicit APIs

2.1 One Picture

High-Arity Globals (Messy)              Small Explicit APIs (Composable)
+-----------------------+               +------------------------------+
| def rag(docs, env,    |               | def iter_rag_core(docs,      |
| cleaner, keep, taps,  |               | config, deps)                |
| chunk_size, more...)  |               | -> Iterator[Chunk]           |
| # Uses GLOBAL_CFG     |               | # Config: env, keep          |
|                       |               | # Deps: cleaner, embed, taps |
+-----------------------+               +------------------------------+
   ↑ Hard to Test/Compose                ↑ Snaps into Partial/Flow

2.2 Contract Table

Aspect High-Arity Globals Small Explicit APIs
Arity >3 params ≤3 (inputs, config, deps)
Dependencies Hidden globals/env vars Explicit config/deps structs
Composability Hard (many args, globals) Easy (partial, flow)
Testing Mock globals, flaky Inject fakes, deterministic
Boundaries Mixed pure/effects Pure core, effectful edges
Reasoning Opaque (hidden state) Equational (substitutable)
Mutable Defaults in Partials Breaks Determinism Use frozen dataclasses or immutable types for configs

Note on High-Arity Choice: Use higher arity only for legacy adapters, wrapping small-arity cores.

2.3 Common API Shapes Table

To lock in the arity rule, here are typical shapes:

Shape Meaning Example
f(data) Pure utility, no config/deps hash(data)
f(data, config) Domain-level core chunk(data, ChunkConfig)
f(data, config, deps) Cross-cutting deps present iter_rag_core(docs, config, deps)

Any other shape (e.g., f(docs, env, keep, cleaner, taps, ...)) must be considered an anti-pattern and refactored.


3. Running Project: FuncPipe RAG Builder

We extend the FuncPipe RAG Builder from m02-rag.md:
- Dataset: 10k arXiv CS abstracts (arxiv_cs_abstracts_10k.csv).
- Goal: Design small-arity, explicit APIs that compose lazily and match the baseline outputs.
- Start: High-arity, global-dependent version (core4_start.py).
- End: FP-friendly API with streaming core and edge materialization.

3.1 Types (Canonical, Used Throughout)

From capstone/src/funcpipe_rag/rag_types.py, capstone/src/funcpipe_rag/api/types.py, plus new config/deps:

from funcpipe_rag import Observations, RagConfig, RagCoreDeps, RagEnv, RagTaps

3.2 High-Arity Start (Anti-Pattern)

# core4_start.py: High-arity, global-dependent RAG
from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from funcpipe_rag import DocRule, Observations, RagTaps
from funcpipe_rag import any_doc
from funcpipe_rag import clean_doc, embed_chunk, structural_dedup_chunks, gen_chunk_doc
from collections.abc import Callable, Sequence

GLOBAL_ENV = RagEnv(512)  # Hidden global


def high_arity_rag(
        docs: list[RawDoc],
        cleaner: Callable[[RawDoc], CleanDoc],
        keep: DocRule | None,
        taps: RagTaps | None,
        chunk_size: int = GLOBAL_ENV.chunk_size,
        debug: bool = False
) -> tuple[list[Chunk], Observations]:
    rule = keep if keep is not None else any_doc
    kept_docs = [d for d in docs if rule(d)]
    if taps and taps.docs and debug:
        taps.docs(tuple(kept_docs))
    cleaned = [cleaner(d) for d in kept_docs]
    if taps and taps.cleaned:
        taps.cleaned(tuple(cleaned))
    chunk_we = [c for cd in cleaned for c in gen_chunk_doc(cd, RagEnv(chunk_size))]
    embedded = [embed_chunk(c) for c in chunk_we]
    chunks = structural_dedup_chunks(embedded)
    if taps and taps.chunks and debug:
        taps.chunks(tuple(chunks))
    obs = Observations(
        total_docs=len(docs),
        total_chunks=len(chunks),
        kept_docs=len(kept_docs),
        cleaned_docs=len(cleaned),
        sample_doc_ids=tuple(d.doc_id for d in kept_docs[:5]),
        sample_chunk_starts=tuple(c.start for c in chunks[:5]),
    )
    return chunks, obs

Smells:
- High arity (6 params: docs, cleaner, keep, taps, chunk_size, debug).
- Hidden global (GLOBAL_ENV).
- Mixed effects (taps with debug flag).
Problem: Hard to partialize, test, or reason about due to excessive params and global dependency.


4. Refactor to FP-Friendly: Small Arity, Explicit Dependencies

To strengthen pedagogy, here's a concrete before/after example for redesigning an unfriendly API:

import os
import pandas as pd
from dataclasses import dataclass

# Before: Unfriendly API with implicit context
def foo(df: pd.DataFrame) -> pd.DataFrame:
    threshold = float(os.environ.get('THRESHOLD', '0.5'))  # Hidden env dep
    return df[df['value'] > threshold]  # Non-deterministic if env changes

@dataclass(frozen=True)
class FooConfig:
    threshold: float

# After: FP-Friendly with explicit deps
def foo(data: pd.DataFrame, *, config: FooConfig) -> pd.DataFrame:
    return data[data['value'] > config.threshold]  # Pure: Depends only on inputs

This makes the function testable (inject mock config) and composable—no surprises from environment variables.

4.1 Streaming Core (Pure, Lazy)

A pure, lazy core with small arity, building on M02C03:

from funcpipe_rag import RagConfig, RagCoreDeps, iter_rag_core

chunks_iter = iter_rag_core(docs, config, deps)

Properties:
- Arity 3: docs, config, deps.
- Pure, fully lazy (generator-based, O(1) memory).
- No taps (effects deferred to edge).
- Explicit config/deps, no globals.

4.2 Post-Clean Streaming Sub-Core

To reuse core logic at the edge without duplicating the full pipeline:

from collections.abc import Iterator, Iterable, Callable
from funcpipe_rag import CleanDoc, Chunk, ChunkWithoutEmbedding, RagConfig
from funcpipe_rag import gen_chunk_doc


def iter_chunks_from_cleaned(
        cleaned: Iterable[CleanDoc],
        config: RagConfig,
        embed: Callable[[ChunkWithoutEmbedding], Chunk]
) -> Iterator[Chunk]:
    """Sub-core: lazy chunk and embed from cleaned docs (reuses M02C03 patterns)."""
    for cd in cleaned:
        for chunk in gen_chunk_doc(cd, config.env):
            yield embed(chunk)

Properties:
- Arity 3: cleaned, config, embed (sub-core, internal; embed injected for consistency).
Here config is domain config and embed is a dependency; we still respect the “data, config, deps” ≤3-arity pattern even in internal sub-cores.
- Enables reuse in full_rag_api_docs (and full_rag_api) for lazy post-clean processing.

4.3 Public API (Edge, Materializes)

Wraps the core components, handles materialization and taps:

from funcpipe_rag import RagConfig, RagCoreDeps, full_rag_api_docs

# Canonical end-of-Module-02 API
# See `capstone/src/funcpipe_rag/rag/rag_api.py` for the public API shape and
# `capstone/src/funcpipe_rag/rag/config.py` for the frozen config and dependency wiring.
chunks, obs = full_rag_api_docs(docs, config, deps)

Properties:
- Arity 3, explicit config/deps.
- Builds on M02C03 laziness internally (lazy post-clean via sub-core); materializes filter/clean at edge for taps/obs.
- Reuses core expressions via a private _tap helper and the iter_chunks_from_cleaned sub-core; taps are observational side effects isolated to the edge.
- Matches the baseline stage composition when config.keep = DEFAULT_RULES, deps.taps = None.
Note: iter_rag_core is the fully streaming core. full_rag_api_docs intentionally materializes intermediates for observations/taps; laziness applies post-clean. Dedup runs post-tap as it requires a global view. _tap is an internal helper in capstone/src/funcpipe_rag/rag/rag_api.py, not a public API.

4.4 Configurator Tie-In (M02C01) and Swapping Examples

from functools import partial
from funcpipe_rag import (
    Chunk,
    ChunkWithoutEmbedding,
    RagConfig,
    RagCoreDeps,
    RagEnv,
    RulesConfig,
    StartsWith,
    full_rag_api_docs,
    get_deps,
)

# Standard variant
standard_config = RagConfig(env=RagEnv(512))
standard_deps = get_deps(standard_config)
rag_fn = partial(full_rag_api_docs, config=standard_config, deps=standard_deps)

# Swapping config: Filter to CS docs
cs_config = RagConfig(
    env=RagEnv(512),
    keep=RulesConfig(keep_pred=StartsWith("categories", "cs.")),
)
cs_deps = get_deps(cs_config)
cs_rag_fn = partial(full_rag_api_docs, config=cs_config, deps=cs_deps)


# Swapping deps: Fake embedder for tests (no I/O)
def fake_embed(c: ChunkWithoutEmbedding) -> Chunk:
    return Chunk(c.doc_id, c.text, c.start, c.end, (0.0,) * 16)  # Mock embedding


test_deps = RagCoreDeps(cleaner=standard_deps.cleaner, embedder=fake_embed, taps=None)
test_rag_fn = partial(full_rag_api_docs, config=standard_config, deps=test_deps)

Wins: Small arity enables easy partialization; config/deps allow clean swapping (e.g., rules via config, fakes via deps). Composes with M02C01 make_rag_fn.


5. Equational Reasoning: Substitution Exercise

Hand Exercise: Substitute expressions in iter_rag_core.
1. Inline rule = config.keep → fixed predicate.
2. Substitute into generator expression → filtered stream.
3. Result: Output stream is fixed for fixed docs, config, deps.
Bug Hunt: In high_arity_rag, GLOBAL_ENV breaks substitution (replacing reference changes behavior).

Example:
- High-arity: chunk_size = GLOBAL_ENV.chunk_size → depends on mutable global, substitution fails.
- Friendly: config.env.chunk_size → immutable, substitutable, behavior preserved.


6. Property-Based Testing: Proving Equivalence and Idempotence

Use Hypothesis to prove the refactor preserves baseline behavior and avoids global bugs.

6.1 Custom Strategy

From capstone/tests/conftest.py.

6.2 Equivalence Property (Core vs Baseline)

# capstone/tests/test_rag_api.py
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import (
    clean_doc,
    embed_chunk,
    get_deps,
    iter_chunk_doc,
    RagConfig,
    structural_dedup_chunks,
    full_rag_api_docs,
    iter_rag_core,
)
from tests.conftest import doc_list_strategy, env_strategy
from itertools import islice, tee

def baseline_full_rag(docs, env):
    embedded = [embed_chunk(c) for d in docs for c in iter_chunk_doc(clean_doc(d), env)]
    return structural_dedup_chunks(embedded)

@given(docs=doc_list_strategy(), env=env_strategy())
def test_rag_equivalence(docs, env):
    config = RagConfig(env=env)
    deps = get_deps(config)
    docs1, docs2 = tee(iter(docs))  # Consistent iterables
    baseline = baseline_full_rag(list(docs1), env)
    chunks, _ = full_rag_api_docs(docs2, config, deps)
    assert chunks == baseline

Note: Tests chunk equivalence to a baseline built from the pure stages.

6.3 Prefix Equivalence (Streaming Core)

@given(docs=doc_list_strategy(), env=env_strategy(), k=st.integers(0, 50))
def test_core_prefix_equivalence(docs, env, k):
    config = RagConfig(env=env)
    deps = get_deps(config)
    docs1, docs2 = tee(iter(docs))
    baseline = baseline_full_rag(list(docs1), env)
    core_iter = iter_rag_core(docs2, config, deps)
    assert list(islice(core_iter, k)) == baseline[:k]

Note: Verifies streaming core matches the baseline on finite prefixes (M02C03 tie-in).

6.4 Idempotence Property

@given(docs=doc_list_strategy(), env=env_strategy())
def test_rag_idempotence(docs, env):
    config = RagConfig(env=env)
    deps = get_deps(config)
    docs1, docs2 = tee(iter(docs))
    chunks1, _ = full_rag_api_docs(docs1, config, deps)
    chunks2, _ = full_rag_api_docs(docs2, config, deps)
    assert chunks1 == chunks2

Note: Verifies same inputs yield same outputs, catching global mutation bugs.

6.5 Shrinking Demo: Catching a Global Bug

Bad refactor with global mutation:

from funcpipe_rag import RawDoc, CleanDoc, Chunk, ChunkWithoutEmbedding, RagEnv
from funcpipe_rag import Observations, RagCoreDeps, eval_pred, full_rag_api_docs
from funcpipe_rag import gen_chunk_doc, structural_dedup_chunks
from collections.abc import Iterator, Iterable, Callable


def bad_full_rag_api(
        docs: Iterable[RawDoc],
        config: RagConfig,
        deps: RagCoreDeps
) -> tuple[list[Chunk], Observations]:
    # Reuse the same GLOBAL_ENV from the high_arity_rag anti-pattern
    global GLOBAL_ENV
    GLOBAL_ENV = RagEnv(config.env.chunk_size + 1)  # Mutates global
    docs_list = list(docs)
    kept_docs = [d for d in docs_list if eval_pred(d, config.keep.keep_pred)]
    cleaned = [deps.cleaner(d) for d in kept_docs]
    chunks_iter = (deps.embedder(c) for cd in cleaned for c in gen_chunk_doc(cd, GLOBAL_ENV))
    chunks = list(chunks_iter)
    chunks = structural_dedup_chunks(chunks)
    obs = Observations(total_docs=len(docs_list), total_chunks=len(chunks), kept_docs=len(kept_docs), cleaned_docs=len(cleaned))
    return chunks, obs

Property testing the bad version:

@given(docs=doc_list_strategy(), env=env_strategy())
def test_bad_rag_idempotence(docs, env):
    global GLOBAL_ENV
    GLOBAL_ENV = env
    config = RagConfig(env=env)
    deps = RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk, taps=None)
    docs1, docs2 = tee(iter(docs))
    chunks1, _ = bad_full_rag_api(docs1, config, deps)
    chunks2, _ = bad_full_rag_api(docs2, config, deps)
    assert chunks1 == chunks2

Failure Trace (Example):

Falsifying example: test_bad_rag_idempotence(
    docs=[RawDoc(doc_id='a', title='t', abstract='abc', categories='c')],
    env=RagEnv(chunk_size=128),
)
AssertionError

Analysis: Shrinks to minimal doc where GLOBAL_ENV mutation changes chunk sizes between calls, breaking idempotence.


7. When FP-Friendly APIs Aren't Worth It

Use higher arity or globals only in:
- Legacy adapters (e.g., framework callbacks requiring fixed signatures).
- One-off scripts with no reuse.
Guardrails: Wrap such functions in thin adapters calling small-arity cores to isolate complexity.

Example:

# Legacy adapter
def legacy_rag(docs, chunk_size, cleaner, keep, debug):
    config = RagConfig(env=RagEnv(chunk_size), keep=keep)
    deps = RagCoreDeps(cleaner=cleaner, embedder=embed_chunk)
    return full_rag_api_docs(docs, config, deps)

8. Pre-Core Quiz

  1. Why does def f(a, b, c, d, e) violate FP-friendly design?
    Answer: Arity >3, hard to partialize or compose.
  2. How to fix a function using GLOBAL_DB?
    Answer: Inject as dependency in deps.
  3. What’s wrong with def rag(docs, cleaner, env, keep, taps)?
    Answer: High arity (5); group env, keep into config, cleaner, taps into deps.
  4. Why use RagConfig and RagCoreDeps structs?
    Answer: Encapsulate domain settings and services, reduce arity, clarify intent.
  5. Tool to prove refactor correctness?
    Answer: Hypothesis (equivalence, idempotence).

9. Post-Core Reflection & Exercise

Reflect: Find a function in your codebase with >3 params or hidden globals. Refactor it to use inputs, config, deps with arity ≤3. Add Hypothesis tests for equivalence and idempotence.
Project Exercise: Apply to RAG pipeline; run properties on arxiv_cs_abstracts_10k.csv.
- Did composability improve (easier partials)?
- Did tests catch global bugs?
- Did config/deps clarify domain logic?

Continue with: Effect Boundaries

Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.

Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.