Tiny Function DSLs¶
Page Maps¶
graph LR
family["Python Programming"]
program["Python Functional Programming"]
section["Data First Apis Expression Style"]
page["Tiny Function DSLs"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
Keep the promise modest, not inflated: you do not need a grand language framework. You need a small way to move domain rules out of scattered conditionals and into data you can inspect and change deliberately.
Start With the Rule Smell¶
The pain here is rarely "too many functions." It is that business rules are spread through helpers, branches, and one-off exceptions until nobody can see the actual policy in one place.
- If a change request starts with "find every branch that mentions category filtering," the rules are too scattered.
- If the only way to understand policy is to simulate the control flow, the policy is not yet represented as data.
- If reviewers cannot print or compare the active rule set, the design is still harder to audit than it needs to be.
Keep This Question In View¶
Core question:
How do you replace sprawling if-else chains and hard-coded domain logic with tiny, composable data-driven DSLs—so rules become printable, testable, evolvable, and flow through M02C07 pipelines without scattering behaviour across the codebase?
This lesson introduces a small data-driven DSL in concrete terms:
- represent the rule itself as immutable data
- use one interpreter to turn that data into a decision
- compose rules without scattering domain logic across the pipeline
The running project matters because filtering rules are exactly the sort of logic that keeps drifting into helpers unless the team gives them a stable representation.
Use this when you have combinator pipelines but still embed domain logic in if-else branches or scattered predicates.
Outcome:
1. Identify rule smells (if-else sprawl, mutable flags) and explain their impact on evolvability.
2. Refactor domain logic to frozen rule data + pure interpreter.
3. Write Hypothesis properties proving DSL equivalence, with a shrinking example.
1. Conceptual Foundation¶
1.1 Tiny Data-Driven DSLs in One Precise Sentence¶
Tiny data-driven DSLs represent domain rules as immutable data (frozen dataclasses with paths and operators) evaluated by pure interpreters—ensuring rules are composable, testable, and flow like config through M02C07 pipelines.
1.2 The One-Sentence Rule¶
Represent domain rules as frozen data with paths and operators evaluated by pure interpreters—never use if-else or mutable flags in core; pass rules like config.
1.3 Why This Matters Now¶
Combinators made the pipeline shape clearer, but they did not solve the problem of policy scattered across branches. This page addresses that missing piece. Once the rule itself becomes data, you can inspect, compare, serialize, and test policy without hunting through control flow.
1.4 DSLs as Values in 5 Lines¶
The next snippet matters because it shows the rule value and the interpreter role separately.
from functools import partial
from funcpipe_rag import All, LenGt, Pred, StartsWith, eval_pred
rules: dict[str, Pred] = {
"cs": StartsWith("categories", "cs."),
"long": LenGt("abstract", 500),
}
keep_pred = All((rules["cs"], rules["long"]))
keep_fn = partial(eval_pred, pred=keep_pred) # RawDoc -> bool
Rule data, evaluated by pure functions, allows storage in dicts, composition with M02C01 partial, and testing as values—explicit and evolvable.
2. Mental Model: If-Else Sprawl vs Data-Driven DSLs¶
2.1 One Picture¶
If-Else Sprawl (Messy) Data-Driven DSLs (Clean)
+---------------------------+ +-----------------------------------+
| if d.categories == "cs": | | cs_rule = StartsWith("categories", "cs.") |
| if len(d.abstract) > 500: | | long_rule = LenGt("abstract", 500)|
| return True | | rule = All(cs_rule, long_rule) |
| ... | | eval_pred(d, rule) |
+---------------------------+ +-----------------------------------+
↑ Hardcoded, Rigid ↑ Data, Composable
2.2 Contract Table¶
| Aspect | If-Else Sprawl | Data-Driven DSLs |
|---|---|---|
| Evolvability | Code changes | Data changes |
| Testability | Mock contexts | Generate rules |
| Readability | Nested branches | Linear data |
| Composability | Manual nesting | All/AnyOf/Not |
| Auditing | Trace execution | Print rule/decision |
| Mutable Defaults in Partials | Breaks Determinism | Use frozen dataclasses or immutable types for configs |
Note on If-Else Choice: Use if-else only for trivial logic; always prefer DSLs for domain rules.
3. Running Project: FuncPipe RAG Builder¶
We extend the FuncPipe RAG Builder from m02-rag.md:
- Dataset: 10k arXiv CS abstracts (arxiv_cs_abstracts_10k.csv).
- Goal: Turn hard-coded rules into data-driven DSL.
- Start: Hard-coded version (core8_start.py).
- End: DSL rules as data, preserving equivalence.
3.1 Types (Canonical, Used Throughout)¶
Use the project’s DSL types from capstone/src/funcpipe_rag/core/rules_pred.py (re-exported from funcpipe_rag):
from funcpipe_rag import All, CleanConfig, LenGt, RagConfig, RagEnv, RulesConfig, StartsWith
CS_RULE = StartsWith("categories", "cs.")
LONG_RULE = LenGt("abstract", 500)
KEEP_PRED = All((CS_RULE, LONG_RULE))
CS_LONG_RULES = RulesConfig(keep_pred=KEEP_PRED)
config = RagConfig(env=RagEnv(512), keep=CS_LONG_RULES, clean=CleanConfig())
Note: DEFAULT_RULES is RulesConfig(keep_pred=All(())) (no conditions ⇒ keep everything). Pass an explicit rules config like CS_LONG_RULES to actually filter.
3.2 Hard-Coded Start (Anti-Pattern)¶
from funcpipe_rag import RawDoc
def hard_keep(d: RawDoc) -> bool:
# Hard-coded path ("categories") and values ("cs.", 500)
return d.categories.startswith("cs.") and len(d.abstract) > 500
Smells:
- Hard-coded paths/values (categories == "cs.").
- If-else sprawl.
- Magic numbers (500).
Problem: Hard to evolve/test; scattered logic.
4. Refactor to DSL: Data-Driven Rules + Interpreter¶
4.1 DSL Data (Frozen, Composable)¶
Rule data in config (as defined in §3.1: CS_RULE, LONG_RULE, KEEP_PRED, CS_LONG_RULES).
Properties:
- Frozen: Immutable.
- Composable: All/AnyOf/Not.
- In config: Flows like data.
4.1.1 Before-and-After Refactoring Snippet¶
To cement the transition from if-else to DSL, here's an explicit mini-example showing the "ugly before" with hard-coded if-else (e.g., from the anti-pattern code) and the "clean after" using DSL data + interpreter:
# Before: Ugly hard-coded if-else chain
from functools import partial
from funcpipe_rag import All, LenGt, RawDoc, StartsWith, eval_pred
def hard_keep(d: RawDoc) -> bool:
return d.categories.startswith("cs.") and len(d.abstract) > 500
# After: Data-driven DSL + pure interpreter (`eval_pred`)
KEEP_PRED = All((StartsWith("categories", "cs."), LenGt("abstract", 500)))
dsl_keep = partial(eval_pred, pred=KEEP_PRED) # RawDoc -> bool
assert dsl_keep(RawDoc("id", "title", "x" * 501, "cs.AI")) is True
assert dsl_keep(RawDoc("id", "title", "short", "cs.AI")) is False
This refactor eliminates hard-coded logic, making the rules data that is easy to test, evolve, and compose—same inputs always yield the same outputs.
4.2 Pure Interpreter (Evaluates Data)¶
The project’s pure interpreter is funcpipe_rag.eval_pred (implemented in capstone/src/funcpipe_rag/core/rules_pred.py). It only supports the known RawDoc paths (doc_id, title, abstract, categories).
Properties:
- Pure: Deterministic, no effects.
- Tied to data: Evaluates rule structures.
4.3 Refactored Core (Uses DSL)¶
Updated core with DSL (building on the compositional helpers under capstone/src/funcpipe_rag/fp/):
from funcpipe_rag import (
All,
LenGt,
RagConfig,
RagEnv,
RulesConfig,
StartsWith,
eval_pred,
ffilter,
flatmap,
flow,
fmap,
gen_chunk_doc,
get_deps,
structural_dedup_chunks,
)
config = RagConfig(
env=RagEnv(512),
keep=RulesConfig(keep_pred=All((StartsWith("categories", "cs."), LenGt("abstract", 500)))),
)
deps = get_deps(config)
keep_rule = lambda d: eval_pred(d, config.keep.keep_pred)
pipeline = flow(
lambda: docs,
ffilter(keep_rule),
fmap(deps.cleaner),
flatmap(lambda cd: gen_chunk_doc(cd, config.env)),
fmap(deps.embedder),
)
chunks = structural_dedup_chunks(pipeline())
Properties:
- Data-driven: Rules as data.
- Composable: Via M02C07 combinators.
4.4 Public API (Unchanged from M02C05–M02C07)¶
from funcpipe_rag import full_rag_api_docs, full_rag_api_path, get_deps
chunks, obs = full_rag_api_docs(docs, config, get_deps(config))
res = full_rag_api_path("arxiv_cs_abstracts_10k.csv", config, boundary_deps)
Properties:
- Keeps Result; boundaries unchanged.
4.5 Configurator Tie-In (M02C01)¶
from funcpipe_rag import make_rag_fn
rag_fn = make_rag_fn(chunk_size=512, keep=CS_LONG_RULES) # docs -> (chunks, obs)
Wins: DSLs compose with M02C01 partial for variants. Note: RagConfig.keep defaults to DEFAULT_RULES (keep everything).
5. Equational Reasoning: Substitution Exercise¶
Hand Exercise: Substitute in eval_pred.
1. Inline KEEP_PRED = All((CS_RULE, LONG_RULE)) → fixed data.
2. Substitute into eval_pred → parametric bool.
3. Result: Behaviour fixed for fixed rule data (immutable).
Bug Hunt: In hard-coded version, if-else breaks substitution.
Example:
- Hard-coded: if d.categories == "cs." → rigid, not substitutable.
- DSL: eval_pred(d, KEEP_PRED) → data-driven, substitutable with fake rule.
6. Property-Based Testing: Proving DSL Behaviour¶
Use Hypothesis to prove refactor preserves data-driven rules.
6.1 Custom Strategy¶
From capstone/tests/conftest.py. Add a raw_doc_strategy if needed for single docs.
6.2 DSL Equivalence Property¶
# capstone/tests/test_rag_api.py (DSL equivalence)
from hypothesis import given
from funcpipe_rag import All, LenGt, RawDoc, StartsWith, eval_pred
from tests.conftest import doc_list_strategy
KEEP_PRED = All((StartsWith("categories", "cs."), LenGt("abstract", 500)))
def hard_keep(d: RawDoc) -> bool:
return d.categories.startswith("cs.") and len(d.abstract) > 500
@given(docs=doc_list_strategy())
def test_dsl_matches_hard_keep(docs):
dsl_kept = [d for d in docs if eval_pred(d, KEEP_PRED)]
hard_kept = [d for d in docs if hard_keep(d)]
assert dsl_kept == hard_kept
Note: Tests DSL matches hard-coded keep.
6.3 DSL Rule Equality Property¶
from dataclasses import replace
@given(docs=doc_list_strategy())
def test_equal_rules_equal_behaviour(docs):
rules1 = KEEP_PRED
rules2 = replace(rules1)
out1 = [d for d in docs if eval_pred(d, rules1)]
out2 = [d for d in docs if eval_pred(d, rules2)]
assert out1 == out2
Note: Verifies rule equality implies behaviour equality.
6.4 DSL Algebraic Property¶
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import All, AnyOf, LenGt, Not, Pred, RawDoc, StartsWith, eval_pred
from tests.conftest import raw_doc_strategy
pred_strategy = st.recursive(
st.one_of(
st.builds(StartsWith, st.just("categories"), st.text(max_size=10)),
st.builds(LenGt, st.just("abstract"), st.integers(min_value=0, max_value=1000)),
),
lambda child: st.one_of(
st.builds(All, st.tuples(child, child)),
st.builds(AnyOf, st.tuples(child, child)),
st.builds(Not, child),
),
max_leaves=20,
)
@given(pred=pred_strategy, doc=raw_doc_strategy())
def test_dsl_double_negation(pred: Pred, doc: RawDoc):
assert eval_pred(doc, pred) == eval_pred(doc, Not(Not(pred)))
Note: Verifies DSL algebraic properties (e.g., double negation) with generated contexts.
6.5 Idempotence Property (DSL-Driven)¶
@given(chunk_size=st.integers(128, 1024))
def test_rag_idempotence(chunk_size):
from funcpipe_rag import Ok, RagBoundaryDeps, RagConfig, RagEnv, full_rag_api_path, get_deps
class FakeReader:
def __init__(self, docs):
self._docs = docs
def read_docs(self, path):
_ = path
return Ok(self._docs)
from funcpipe_rag import All, LenGt, RulesConfig, StartsWith
keep = RulesConfig(keep_pred=All((StartsWith("categories", "cs."), LenGt("abstract", 500))))
config = RagConfig(env=RagEnv(chunk_size), keep=keep)
deps = RagBoundaryDeps(core=get_deps(config), reader=FakeReader([]))
res1 = full_rag_api_path("fake_path", config, deps)
res2 = full_rag_api_path("fake_path", config, deps)
assert res1 == res2
Note: Ensures no hidden state with immutable DSL rules and faked deps (see capstone/tests/test_rag_api.py for a minimal FakeReader pattern).
6.6 Full Pipeline Equivalence Property¶
# capstone/tests/test_rag_api.py (baseline equivalence)
from hypothesis import given
from funcpipe_rag import (
DEFAULT_RULES,
RagConfig,
clean_doc,
embed_chunk,
full_rag_api_docs,
gen_chunk_doc,
get_deps,
structural_dedup_chunks,
)
from tests.conftest import doc_list_strategy, env_strategy
def _baseline_chunks(docs, env):
cleaned = [clean_doc(d) for d in docs]
embedded = [embed_chunk(c) for cd in cleaned for c in gen_chunk_doc(cd, env)]
return structural_dedup_chunks(embedded)
@given(docs=doc_list_strategy(), env=env_strategy())
def test_full_rag_api_docs_matches_baseline(docs, env):
config = RagConfig(env=env, keep=DEFAULT_RULES)
deps = get_deps(config)
chunks, obs = full_rag_api_docs(docs, config, deps)
assert chunks == _baseline_chunks(docs, env)
assert obs.total_docs == len(docs)
Note: Tests the full API matches a baseline built from the pure stages (with DEFAULT_RULES ⇒ keep everything).
6.7 Shrinking Demo: Catching a Leaky Bug¶
Bad interpreter with mutable:
from funcpipe_rag import All, LenGt, Not, StartsWith, eval_pred
KEEP_PRED = All((StartsWith("categories", "cs."), LenGt("abstract", 500)))
MUTABLE_PRED = KEEP_PRED
def bad_keep(doc) -> bool:
global MUTABLE_PRED
MUTABLE_PRED = Not(MUTABLE_PRED) # Leaky mutation
return eval_pred(doc, MUTABLE_PRED)
Property:
from hypothesis import given
from tests.conftest import raw_doc_strategy
@given(doc=raw_doc_strategy())
def test_bad_dsl_is_not_idempotent(doc):
global MUTABLE_PRED
MUTABLE_PRED = KEEP_PRED
out1 = bad_keep(doc)
out2 = bad_keep(doc)
assert out1 == out2
Failure Trace (Example):
Falsifying example: test_bad_dsl_is_not_idempotent(
doc=RawDoc(doc_id='1', title='t', abstract='...', categories='cs.AI'),
)
AssertionError
Analysis: Shrinks to a minimal RawDoc where toggling the global predicate flips the result; catches mutation bug.
7. When DSLs Aren't Worth It¶
Use if-else only in:
- Trivial one-rule logic.
- Legacy code wrapping DSLs.
Guardrails: Isolate to <5 lines; prefer DSLs for domain rules.
Example:
8. Pre-Core Quiz¶
- If-else chain? → Hard-coded logic.
- Mutable rule? → frozen=True.
- Magic path? → LenGt("path", value).
- Global rule? → Pass as param.
- Prove rules? → Hypothesis recursive.
9. Post-Core Reflection & Exercise¶
Reflect: Find if-else domain logic. Refactor to frozen rule data + interpreter; add Hypothesis for equivalence/idempotence.
Project Exercise: Apply to RAG (e.g., keep as DSL); run properties.
- Did data reduce branches?
- Did interpreter enable tests?
- Did composability clarify logic?
Continue with: Debugging Compositions
Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.
Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.