Higher-Order Composition¶
Page Maps¶
graph LR
family["Python Programming"]
program["Python Functional Programming"]
section["Purity Substitution Local Reasoning"]
page["Higher-Order Composition"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
This lesson is about a practical refactor move: stop rewriting the same loop structure when the real variation is only the transform inside it.
Start With the Concrete Problem¶
Without composition, codebases tend to collect:
- loops that differ in only one or two lines
- nested transforms where the business rule is buried inside traversal mechanics
- callback-heavy code that is harder to reorder or test than it should be
Composition solves that by separating:
- the fixed traversal or pipeline structure
- the specific function passed into that structure
Keep This Question In View¶
Core question:
How do you build concise, reusable, composable transformations using functions as first-class citizens—so that complex logic emerges from simple, testable building blocks without imperative loops or hidden state?
By the end of this lesson, you should be able to identify:
- the repeated pipeline shape
- the small function that should be extracted
- whether the resulting pipeline is actually clearer than the loop it replaced
1. Conceptual Foundation¶
1.1 The One-Sentence Rule¶
Default to declarative pipelines over imperative loops; compose small, pure, testable functions into readable chains.
1.2 Higher-Order Composition in One Precise Sentence¶
Higher-order functions take functions as arguments or return them; composition chains them into pipelines that apply sequentially while preserving purity and immutability.
1.3 Why This Matters Now¶
Higher-order composition matters because it keeps structure and behavior separate. Once the transform becomes a value, you can:
- test the small function on its own
- test the traversal logic on its own
- swap or reorder steps without rewriting the whole loop body
2. Mental Model: Imperative Loop vs Declarative Pipeline¶
2.1 One Picture¶
Imperative loop Declarative pipeline
+--------------------------------+ +-------------------------------------+
| total = 0 | | from operator import add |
| for x in xs: | | total = reduce(add, |
| if x > 0: | | filter(pos, |
| total += x * x | | map(sq, xs)), |
| print(total) | | 0) |
+--------------------------------+ +-------------------------------------+
Structure and transform are separated
2.2 Contract Table¶
| Clause | Violation Example | Detected By |
|---|---|---|
| Identity laws | flow(identity, f) ≠ f | Hypothesis identity property |
| Equivalence | Pipeline ≠ known-correct impl | Hypothesis equivalence vs imperative |
| Laziness / Memory | Eager list(map()) on large data | Manual memory profiling |
Note on Pipelines: In Python, declarative code often means comprehensions plus small
pure helpers, with map, filter, reduce, or a local flow helper used when they make
the data movement clearer. The point is not to force one syntax. The point is to make the
transform easy to see.
2.3 How This Relates to Plain Python¶
Python has multiple ways to express transformations; choose based on readability and laziness:
| Transformation | For-Loop (Imperative) | List Comprehension (Declarative) | map / filter (HO, lazy) |
|---|---|---|---|
| Apply f to xs | res = [] for x in xs: res.append(f(x)) |
[f(x) for x in xs] | map(f, xs) |
| Filter positives | res = [] for x in xs: if x > 0: res.append(x) |
[x for x in xs if x > 0] | filter(pos, xs) |
| Pros/Cons | Verbose, mutable accum | Readable, eager (full list) | Lazy iterators, composable |
Guidance: Use comprehensions for simple cases; map/filter for laziness; curried versions (e.g., via functools.partial) for pipelines. Inline short pipelines; name reusable ones.
3. Running Project: HO Composition in RAG Pipeline¶
Our running project (from module-01/funcpipe-rag-01/README.md) composes RAG stages with HO functions.
- Goal: Replace imperative loops with declarative pipelines.
- Start: Core 1-3's pure functions.
- End (this core): HO-composed full_rag with properties for laws/equivalence. Semantics aligned with Core 1-3.
3.1 Types (Canonical)¶
These are defined in module-01/funcpipe-rag-01/src/funcpipe_rag/rag_types.py (as in Core 1) and imported as needed. No redefinition here.
3.2 Imperative Variants (Anti-Patterns in RAG)¶
Full code:
from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, RagEnv
from typing import List
# Imperative clean (manual loop)
def imperative_clean_doc(doc: RawDoc) -> CleanDoc:
words = doc.abstract.strip().lower().split()
abstract = ""
for word in words:
abstract += word + " " # Mutable string accum
return CleanDoc(doc.doc_id, doc.title, abstract.strip(), doc.categories)
# Imperative chunk (manual indexing)
def imperative_chunk_doc(doc: CleanDoc, env: RagEnv) -> List[ChunkWithoutEmbedding]:
text = doc.abstract
chunks = []
i = 0
while i < len(text):
end = min(i + env.chunk_size, len(text))
chunks.append(ChunkWithoutEmbedding(doc.doc_id, text[i:end], i, end))
i = end # Manual step; off-by-one risk
return chunks
# Imperative embed (non-associative reduce example)
import hashlib
from typing import Tuple
from funcpipe_rag import Chunk
def imperative_embed_chunk(chunk: ChunkWithoutEmbedding) -> Chunk:
h = hashlib.sha256(chunk.text.encode("utf-8")).hexdigest()
step = 4
vec = []
for i in range(0, 64, step):
val = int(h[i:i + step], 16) / (16 ** step - 1)
vec.append(val) # Mutable accum
return Chunk(chunk.doc_id, chunk.text, chunk.start, chunk.end, tuple(vec))
Smells: Manual accum (mutable string/list), indexing (error-prone), non-HO (no composable parts).
4. Refactor to HO: Declarative Pipelines in RAG¶
4.1 HO Core¶
module-01/funcpipe-rag-01/src/funcpipe_rag/full_rag.py wires the pure stages together with the fmap and flow helpers from fp.py. Full code for fp.py:
# module-01/funcpipe-rag-01/src/funcpipe_rag/fp.py
from typing import Callable, TypeVar, List
T = TypeVar("T")
U = TypeVar("U")
V = TypeVar("V")
def fmap(f: Callable[[T], U]) -> Callable[[List[T]], List[U]]:
# fmap obeys the usual functor laws for list: identity and composition
# This fmap is a simple list-based helper; it’s eager. In Module 3 we’ll generalize this pattern to lazy iterators.
def mapper(xs: List[T]) -> List[U]:
return [f(x) for x in xs]
return mapper
def flow(*fs: Callable) -> Callable:
def composed(x):
for f in fs:
x = f(x)
return x
return composed
def identity(x):
return x
Full code for full_rag.py:
# module-01/funcpipe-rag-01/src/funcpipe_rag/full_rag.py
from typing import List
from funcpipe_rag import fmap, flow
from funcpipe_rag import clean_doc, chunk_doc, embed_chunk, structural_dedup_chunks
from funcpipe_rag import RawDoc, Chunk, RagEnv
def docs_to_embedded(docs: List[RawDoc], env: RagEnv) -> List[Chunk]:
cleaned = fmap(clean_doc)(docs)
chunked = [c for doc in cleaned for c in chunk_doc(doc, env)]
return fmap(embed_chunk)(chunked)
def full_rag(docs: List[RawDoc], env: RagEnv) -> List[Chunk]:
return structural_dedup_chunks(docs_to_embedded(docs, env))
def full_rag_point_free(docs: List[RawDoc], env: RagEnv) -> List[Chunk]:
return flow(
fmap(clean_doc),
lambda cleaned: [c for doc in cleaned for c in chunk_doc(doc, env)],
fmap(embed_chunk),
structural_dedup_chunks,
)(docs)
These helpers provide the declarative composition we care about: fmap applies a pure stage across a list, and flow threads data through the entire pipeline without imperative bookkeeping. Note: flow(f, g, h)(x) means h(g(f(x))) (left-to-right).
4.2 Non-RAG Example: Data Munging Pipeline¶
For a simple CSV processing pipeline. Full code:
from typing import Dict, List, Tuple
# Imperative version (anti-pattern)
def imperative_process_data(rows: List[Dict[str, str]]) -> List[Dict[str, float]]:
cleaned = []
for row in rows:
if row['age'] and int(row['age']) > 18:
cleaned.append({'id': row['id'], 'score': float(row['score'])})
normalized = []
for item in cleaned:
item['score'] = item['score'] / 100.0 # Mutates
normalized.append(item)
return normalized
# HO refactor
def is_adult(row: Dict[str, str]) -> bool:
age = row.get('age')
return age is not None and int(age) > 18
def parse_score(row: Dict[str, str]) -> Dict[str, float]:
return {'id': row['id'], 'score': float(row['score'])}
def normalize_score(item: Dict[str, float]) -> Dict[str, float]:
return {**item, 'score': item['score'] / 100.0} # New dict
def process_data(rows: List[Dict[str, str]]) -> Tuple[Dict[str, float], ...]:
return tuple(
map(normalize_score,
map(parse_score,
filter(is_adult, rows)
)
)
)
Here we intentionally use built-in map/filter to highlight laziness; our fmap helper above is eager and for lists only.
Wins: Internal stages are lazy iterators, no mutation, reusable parts. Moving filter before expensive map avoids wasted work on invalid rows.
4.3 Non-RAG Example: Business Logic Pipeline¶
For a request handler. Full code:
from typing import Dict, Callable
from funcpipe_rag import flow
# Imperative version (anti-pattern)
def imperative_handle_request(req: Dict) -> Dict:
if not authenticate(req):
return {'error': 'unauth'}
if not authorize(req):
return {'error': 'unauthorized'}
result = run_action(req)
return {'result': result}
# HO refactor (composition)
# reusing flow from fp.py
def check_auth(req: Dict) -> Dict:
if not authenticate(req):
raise ValueError('unauth')
return req
def check_authz(req: Dict) -> Dict:
if not authorize(req):
raise ValueError('unauthorized')
return req
handle_request = flow(
check_auth,
check_authz,
lambda req: {'result': run_action(req)},
)
Wins: Composable guards, easy to reorder/add steps without nesting. Real-world handlers often compose error-aware wrappers (like Result/Either types); here we raise exceptions to keep the example small. We’ll later replace these ValueError raises with explicit Result values.
4.4 Impure Shell (Edge Only)¶
The shell from Core 1 remains; HO focuses on core.
4.5 Tradeoffs in Composition¶
Overly long pipelines hurt readability—split if >3-4 stages. If a pipeline is only used once and short, inline it. Imperative may win in ultra-hot paths, but wrap in HO API.
5. Equational Reasoning: Substitution Exercise¶
Hand Exercise: Replace expressions in full_rag.
1. Inline docs_to_embedded(docs, env) → list of chunks.
2. Substitute into structural_dedup_chunks → fixed value.
3. Result: Entire call = fixed value.
Bug Hunt: In imperative_chunk_doc, substitution fails (manual indexing risks).
6. Property-Based Testing: Proving Equivalence (Advanced, Optional)¶
Use Hypothesis to prove laws.
You can safely skip this on a first read and still follow later cores—come back when you want to mechanically verify your own refactors.
6.1 Custom Strategy (RAG Domain)¶
From module-01/funcpipe-rag-01/tests/conftest.py (as in Core 1).
6.2 Composition & Equivalence Properties for the RAG Pipeline¶
Full code:
# module-01/funcpipe-rag-01/tests/test_laws.py (excerpt)
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import fmap, flow, identity
from funcpipe_rag import full_rag, full_rag_point_free
from funcpipe_rag import RagEnv, RawDoc
from .conftest import doc_list_strategy, env_strategy
@given(xs=st.lists(st.integers()))
def test_fmap_identity(xs):
assert fmap(lambda x: x)(xs) == xs
@given(xs=st.lists(st.integers()))
def test_fmap_composition(xs):
inc = lambda x: x + 1
double = lambda x: x * 2
assert fmap(lambda x: double(inc(x)))(xs) == fmap(double)(fmap(inc)(xs))
@given(docs=doc_list_strategy(), env=env_strategy())
def test_full_rag_equivalent_forms(docs, env):
assert full_rag(docs, env) == full_rag_point_free(docs, env)
These real tests enforce both the functor laws for fmap and the equivalence between our straight-line and point-free pipeline definitions.
6.3 Shrinking Demo: Catching a Bug¶
Bad refactor (non-associative reduce in full_rag):
Full code:
from functools import reduce
from typing import List, Tuple
from funcpipe_rag import clean_doc, chunk_doc, embed_chunk
from funcpipe_rag import RawDoc, Chunk, RagEnv
def bad_full_rag(docs: List[RawDoc], env: RagEnv) -> Tuple[Chunk, ...]:
# Buggy use of reduce: new tuple concatenated on the left reverses the doc order
return reduce(
lambda acc, d: tuple(embed_chunk(c) for c in chunk_doc(clean_doc(d), env)) + acc,
docs,
tuple()
) # Wrong concat (reverses order)
Property:
from hypothesis import given
from funcpipe_rag import RawDoc, RagEnv
from .conftest import doc_list_strategy, env_strategy
@given(docs=doc_list_strategy(), env=env_strategy())
def test_bad_full_rag_equivalence(docs: List[RawDoc], env: RagEnv) -> None:
imperative = tuple(
embed_chunk(c)
for d in docs
for c in chunk_doc(clean_doc(d), env)
)
assert bad_full_rag(docs, env) == imperative # Fails on order
Hypothesis failure trace (run to verify; example):
Falsifying example: test_bad_full_rag_equivalence(
docs=[RawDoc(doc_id='a', title='', abstract='a', categories=''), RawDoc(doc_id='b', title='', abstract='b', categories='')],
env=RagEnv(chunk_size=128),
)
AssertionError
- Shrinks to two docs; reversed order fails equality. Catches bug via shrinking.
7. When HO Isn't Worth It¶
Rarely, for profiled hot paths (e.g., tight loops), use imperative but wrap in HO API.
8. Pre-Core Quiz¶
- Side-effect in map → violates? → Purity
- Non-associative op in reduce → violates? → Predictability
- Nested for-loops → better as? → Pipeline
- List comp vs generator exp → memory trade-off? → List comp O(n) extra
- Tool to prove equivalence? → Hypothesis
9. Post-Core Reflection & Exercise¶
Reflect: In your code, find one imperative loop. Compose it; add HO properties.
Project Exercise: Compose RAG stages; run properties on sample data.
All claims (e.g., referential transparency) are verifiable via the provided Hypothesis examples—run them to confirm.
Further Reading: For more on purity pitfalls, see 'Fluent Python' Chapter on Functions as Objects. Check free resources like Python.org's FP section or Codecademy's Advanced Python course for basics.
Continue with: Local FP Refactors