Tiny Function DSLs¶

Page Maps¶

graph LR
  family["Python Programming"]
  program["Python Functional Programming"]
  section["Data First Apis Expression Style"]
  page["Tiny Function DSLs"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Keep the promise modest, not inflated: you do not need a grand language framework. You need a small way to move domain rules out of scattered conditionals and into data you can inspect and change deliberately.

Start With the Rule Smell¶

The pain here is rarely "too many functions." It is that business rules are spread through helpers, branches, and one-off exceptions until nobody can see the actual policy in one place.

If a change request starts with "find every branch that mentions category filtering," the rules are too scattered.
If the only way to understand policy is to simulate the control flow, the policy is not yet represented as data.
If reviewers cannot print or compare the active rule set, the design is still harder to audit than it needs to be.

Keep This Question In View¶

Core question:
How do you replace sprawling if-else chains and hard-coded domain logic with tiny, composable data-driven DSLs—so rules become printable, testable, evolvable, and flow through M02C07 pipelines without scattering behaviour across the codebase?

This lesson introduces a small data-driven DSL in concrete terms:

represent the rule itself as immutable data
use one interpreter to turn that data into a decision
compose rules without scattering domain logic across the pipeline

The running project matters because filtering rules are exactly the sort of logic that keeps drifting into helpers unless the team gives them a stable representation.

Use this when you have combinator pipelines but still embed domain logic in if-else branches or scattered predicates. Outcome:
1. Identify rule smells (if-else sprawl, mutable flags) and explain their impact on evolvability.
2. Refactor domain logic to frozen rule data + pure interpreter.
3. Write Hypothesis properties proving DSL equivalence, with a shrinking example.

1. Conceptual Foundation¶

1.1 Tiny Data-Driven DSLs in One Precise Sentence¶

Tiny data-driven DSLs represent domain rules as immutable data (frozen dataclasses with paths and operators) evaluated by pure interpreters—ensuring rules are composable, testable, and flow like config through M02C07 pipelines.

1.2 The One-Sentence Rule¶

Represent domain rules as frozen data with paths and operators evaluated by pure interpreters—never use if-else or mutable flags in core; pass rules like config.

1.3 Why This Matters Now¶

Combinators made the pipeline shape clearer, but they did not solve the problem of policy scattered across branches. This page addresses that missing piece. Once the rule itself becomes data, you can inspect, compare, serialize, and test policy without hunting through control flow.

1.4 DSLs as Values in 5 Lines¶

The next snippet matters because it shows the rule value and the interpreter role separately.

from functools import partial

from funcpipe_rag import All, LenGt, Pred, StartsWith, eval_pred

rules: dict[str, Pred] = {
    "cs": StartsWith("categories", "cs."),
    "long": LenGt("abstract", 500),
}

keep_pred = All((rules["cs"], rules["long"]))
keep_fn = partial(eval_pred, pred=keep_pred)  # RawDoc -> bool

Rule data, evaluated by pure functions, allows storage in dicts, composition with M02C01 partial, and testing as values—explicit and evolvable.

2. Mental Model: If-Else Sprawl vs Data-Driven DSLs¶

2.1 One Picture¶

If-Else Sprawl (Messy)                        Data-Driven DSLs (Clean)
+---------------------------+                 +-----------------------------------+
| if d.categories == "cs":  |                 | cs_rule = StartsWith("categories", "cs.") |
| if len(d.abstract) > 500: |                 | long_rule = LenGt("abstract", 500)|
| return True               |                 | rule = All(cs_rule, long_rule)    |
| ...                       |                 | eval_pred(d, rule)                |
+---------------------------+                 +-----------------------------------+
   ↑ Hardcoded, Rigid                             ↑ Data, Composable

2.2 Contract Table¶

Aspect	If-Else Sprawl	Data-Driven DSLs
Evolvability	Code changes	Data changes
Testability	Mock contexts	Generate rules
Readability	Nested branches	Linear data
Composability	Manual nesting	All/AnyOf/Not
Auditing	Trace execution	Print rule/decision
Mutable Defaults in Partials	Breaks Determinism	Use frozen dataclasses or immutable types for configs

Note on If-Else Choice: Use if-else only for trivial logic; always prefer DSLs for domain rules.

3. Running Project: FuncPipe RAG Builder¶

We extend the FuncPipe RAG Builder from m02-rag.md:
- Dataset: 10k arXiv CS abstracts (arxiv_cs_abstracts_10k.csv).
- Goal: Turn hard-coded rules into data-driven DSL.
- Start: Hard-coded version (core8_start.py).
- End: DSL rules as data, preserving equivalence.

3.1 Types (Canonical, Used Throughout)¶

Use the project’s DSL types from capstone/src/funcpipe_rag/core/rules_pred.py (re-exported from funcpipe_rag):

from funcpipe_rag import All, CleanConfig, LenGt, RagConfig, RagEnv, RulesConfig, StartsWith

CS_RULE = StartsWith("categories", "cs.")
LONG_RULE = LenGt("abstract", 500)
KEEP_PRED = All((CS_RULE, LONG_RULE))
CS_LONG_RULES = RulesConfig(keep_pred=KEEP_PRED)

config = RagConfig(env=RagEnv(512), keep=CS_LONG_RULES, clean=CleanConfig())

Note: DEFAULT_RULES is RulesConfig(keep_pred=All(())) (no conditions ⇒ keep everything). Pass an explicit rules config like CS_LONG_RULES to actually filter.

3.2 Hard-Coded Start (Anti-Pattern)¶

from funcpipe_rag import RawDoc


def hard_keep(d: RawDoc) -> bool:
    # Hard-coded path ("categories") and values ("cs.", 500)
    return d.categories.startswith("cs.") and len(d.abstract) > 500

Smells:
- Hard-coded paths/values (categories == "cs.").
- If-else sprawl.
- Magic numbers (500).
Problem: Hard to evolve/test; scattered logic.

4. Refactor to DSL: Data-Driven Rules + Interpreter¶

4.1 DSL Data (Frozen, Composable)¶

Rule data in config (as defined in §3.1: CS_RULE, LONG_RULE, KEEP_PRED, CS_LONG_RULES).

Properties:
- Frozen: Immutable.
- Composable: All/AnyOf/Not.
- In config: Flows like data.

4.1.1 Before-and-After Refactoring Snippet¶

To cement the transition from if-else to DSL, here's an explicit mini-example showing the "ugly before" with hard-coded if-else (e.g., from the anti-pattern code) and the "clean after" using DSL data + interpreter:

# Before: Ugly hard-coded if-else chain
from functools import partial

from funcpipe_rag import All, LenGt, RawDoc, StartsWith, eval_pred


def hard_keep(d: RawDoc) -> bool:
    return d.categories.startswith("cs.") and len(d.abstract) > 500


# After: Data-driven DSL + pure interpreter (`eval_pred`)
KEEP_PRED = All((StartsWith("categories", "cs."), LenGt("abstract", 500)))
dsl_keep = partial(eval_pred, pred=KEEP_PRED)  # RawDoc -> bool


assert dsl_keep(RawDoc("id", "title", "x" * 501, "cs.AI")) is True
assert dsl_keep(RawDoc("id", "title", "short", "cs.AI")) is False

This refactor eliminates hard-coded logic, making the rules data that is easy to test, evolve, and compose—same inputs always yield the same outputs.

4.2 Pure Interpreter (Evaluates Data)¶

The project’s pure interpreter is funcpipe_rag.eval_pred (implemented in capstone/src/funcpipe_rag/core/rules_pred.py). It only supports the known RawDoc paths (doc_id, title, abstract, categories).

from funcpipe_rag import eval_pred

Properties:
- Pure: Deterministic, no effects.
- Tied to data: Evaluates rule structures.

4.3 Refactored Core (Uses DSL)¶

Updated core with DSL (building on the compositional helpers under capstone/src/funcpipe_rag/fp/):

from funcpipe_rag import (
    All,
    LenGt,
    RagConfig,
    RagEnv,
    RulesConfig,
    StartsWith,
    eval_pred,
    ffilter,
    flatmap,
    flow,
    fmap,
    gen_chunk_doc,
    get_deps,
    structural_dedup_chunks,
)

config = RagConfig(
    env=RagEnv(512),
    keep=RulesConfig(keep_pred=All((StartsWith("categories", "cs."), LenGt("abstract", 500)))),
)
deps = get_deps(config)

keep_rule = lambda d: eval_pred(d, config.keep.keep_pred)
pipeline = flow(
    lambda: docs,
    ffilter(keep_rule),
    fmap(deps.cleaner),
    flatmap(lambda cd: gen_chunk_doc(cd, config.env)),
    fmap(deps.embedder),
)

chunks = structural_dedup_chunks(pipeline())

Properties:
- Data-driven: Rules as data.
- Composable: Via M02C07 combinators.

4.4 Public API (Unchanged from M02C05–M02C07)¶

from funcpipe_rag import full_rag_api_docs, full_rag_api_path, get_deps

chunks, obs = full_rag_api_docs(docs, config, get_deps(config))
res = full_rag_api_path("arxiv_cs_abstracts_10k.csv", config, boundary_deps)

Properties:
- Keeps Result; boundaries unchanged.

4.5 Configurator Tie-In (M02C01)¶

from funcpipe_rag import make_rag_fn

rag_fn = make_rag_fn(chunk_size=512, keep=CS_LONG_RULES)  # docs -> (chunks, obs)

Wins: DSLs compose with M02C01 partial for variants. Note: RagConfig.keep defaults to DEFAULT_RULES (keep everything).

5. Equational Reasoning: Substitution Exercise¶

Hand Exercise: Substitute in eval_pred.
1. Inline KEEP_PRED = All((CS_RULE, LONG_RULE)) → fixed data.
2. Substitute into eval_pred → parametric bool.
3. Result: Behaviour fixed for fixed rule data (immutable).
Bug Hunt: In hard-coded version, if-else breaks substitution.

Example:
- Hard-coded: if d.categories == "cs." → rigid, not substitutable.
- DSL: eval_pred(d, KEEP_PRED) → data-driven, substitutable with fake rule.

6. Property-Based Testing: Proving DSL Behaviour¶

Use Hypothesis to prove refactor preserves data-driven rules.

6.1 Custom Strategy¶

From capstone/tests/conftest.py. Add a raw_doc_strategy if needed for single docs.

6.2 DSL Equivalence Property¶

# capstone/tests/test_rag_api.py (DSL equivalence)
from hypothesis import given

from funcpipe_rag import All, LenGt, RawDoc, StartsWith, eval_pred
from tests.conftest import doc_list_strategy

KEEP_PRED = All((StartsWith("categories", "cs."), LenGt("abstract", 500)))


def hard_keep(d: RawDoc) -> bool:
    return d.categories.startswith("cs.") and len(d.abstract) > 500


@given(docs=doc_list_strategy())
def test_dsl_matches_hard_keep(docs):
    dsl_kept = [d for d in docs if eval_pred(d, KEEP_PRED)]
    hard_kept = [d for d in docs if hard_keep(d)]
    assert dsl_kept == hard_kept

Note: Tests DSL matches hard-coded keep.

6.3 DSL Rule Equality Property¶

from dataclasses import replace


@given(docs=doc_list_strategy())
def test_equal_rules_equal_behaviour(docs):
    rules1 = KEEP_PRED
    rules2 = replace(rules1)
    out1 = [d for d in docs if eval_pred(d, rules1)]
    out2 = [d for d in docs if eval_pred(d, rules2)]
    assert out1 == out2

Note: Verifies rule equality implies behaviour equality.

6.4 DSL Algebraic Property¶

from hypothesis import given
import hypothesis.strategies as st

from funcpipe_rag import All, AnyOf, LenGt, Not, Pred, RawDoc, StartsWith, eval_pred
from tests.conftest import raw_doc_strategy

pred_strategy = st.recursive(
    st.one_of(
        st.builds(StartsWith, st.just("categories"), st.text(max_size=10)),
        st.builds(LenGt, st.just("abstract"), st.integers(min_value=0, max_value=1000)),
    ),
    lambda child: st.one_of(
        st.builds(All, st.tuples(child, child)),
        st.builds(AnyOf, st.tuples(child, child)),
        st.builds(Not, child),
    ),
    max_leaves=20,
)


@given(pred=pred_strategy, doc=raw_doc_strategy())
def test_dsl_double_negation(pred: Pred, doc: RawDoc):
    assert eval_pred(doc, pred) == eval_pred(doc, Not(Not(pred)))

Note: Verifies DSL algebraic properties (e.g., double negation) with generated contexts.

6.5 Idempotence Property (DSL-Driven)¶

@given(chunk_size=st.integers(128, 1024))
def test_rag_idempotence(chunk_size):
    from funcpipe_rag import Ok, RagBoundaryDeps, RagConfig, RagEnv, full_rag_api_path, get_deps

    class FakeReader:
        def __init__(self, docs):
            self._docs = docs

        def read_docs(self, path):
            _ = path
            return Ok(self._docs)

    from funcpipe_rag import All, LenGt, RulesConfig, StartsWith

    keep = RulesConfig(keep_pred=All((StartsWith("categories", "cs."), LenGt("abstract", 500))))
    config = RagConfig(env=RagEnv(chunk_size), keep=keep)
    deps = RagBoundaryDeps(core=get_deps(config), reader=FakeReader([]))
    res1 = full_rag_api_path("fake_path", config, deps)
    res2 = full_rag_api_path("fake_path", config, deps)
    assert res1 == res2

Note: Ensures no hidden state with immutable DSL rules and faked deps (see capstone/tests/test_rag_api.py for a minimal FakeReader pattern).

6.6 Full Pipeline Equivalence Property¶

# capstone/tests/test_rag_api.py (baseline equivalence)
from hypothesis import given

from funcpipe_rag import (
    DEFAULT_RULES,
    RagConfig,
    clean_doc,
    embed_chunk,
    full_rag_api_docs,
    gen_chunk_doc,
    get_deps,
    structural_dedup_chunks,
)
from tests.conftest import doc_list_strategy, env_strategy


def _baseline_chunks(docs, env):
    cleaned = [clean_doc(d) for d in docs]
    embedded = [embed_chunk(c) for cd in cleaned for c in gen_chunk_doc(cd, env)]
    return structural_dedup_chunks(embedded)


@given(docs=doc_list_strategy(), env=env_strategy())
def test_full_rag_api_docs_matches_baseline(docs, env):
    config = RagConfig(env=env, keep=DEFAULT_RULES)
    deps = get_deps(config)
    chunks, obs = full_rag_api_docs(docs, config, deps)
    assert chunks == _baseline_chunks(docs, env)
    assert obs.total_docs == len(docs)

Note: Tests the full API matches a baseline built from the pure stages (with DEFAULT_RULES ⇒ keep everything).

6.7 Shrinking Demo: Catching a Leaky Bug¶

Bad interpreter with mutable:

from funcpipe_rag import All, LenGt, Not, StartsWith, eval_pred

KEEP_PRED = All((StartsWith("categories", "cs."), LenGt("abstract", 500)))
MUTABLE_PRED = KEEP_PRED


def bad_keep(doc) -> bool:
    global MUTABLE_PRED
    MUTABLE_PRED = Not(MUTABLE_PRED)  # Leaky mutation
    return eval_pred(doc, MUTABLE_PRED)

Property:

from hypothesis import given

from tests.conftest import raw_doc_strategy


@given(doc=raw_doc_strategy())
def test_bad_dsl_is_not_idempotent(doc):
    global MUTABLE_PRED
    MUTABLE_PRED = KEEP_PRED
    out1 = bad_keep(doc)
    out2 = bad_keep(doc)
    assert out1 == out2

Failure Trace (Example):

Falsifying example: test_bad_dsl_is_not_idempotent(
    doc=RawDoc(doc_id='1', title='t', abstract='...', categories='cs.AI'),
)
AssertionError

Analysis: Shrinks to a minimal RawDoc where toggling the global predicate flips the result; catches mutation bug.

7. When DSLs Aren't Worth It¶

Use if-else only in:
- Trivial one-rule logic.
- Legacy code wrapping DSLs.
Guardrails: Isolate to <5 lines; prefer DSLs for domain rules.

Example:

# Trivial
if x > 0: print(x)  # OK for one-off

8. Pre-Core Quiz¶

If-else chain? → Hard-coded logic.
Mutable rule? → frozen=True.
Magic path? → LenGt("path", value).
Global rule? → Pass as param.
Prove rules? → Hypothesis recursive.

9. Post-Core Reflection & Exercise¶

Reflect: Find if-else domain logic. Refactor to frozen rule data + interpreter; add Hypothesis for equivalence/idempotence.
Project Exercise: Apply to RAG (e.g., keep as DSL); run properties.
- Did data reduce branches?
- Did interpreter enable tests?
- Did composability clarify logic?

Continue with: Debugging Compositions

Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.

Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.