Compositional Domain Models¶

Page Maps¶

graph LR
  family["Python Programming"]
  program["Python Functional Programming"]
  section["Algebraic Data Modelling Validation"]
  page["Compositional Domain Models"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Separate two ideas that often get blurred together: a domain can be one business concept and still be best modeled as several smaller value types. The crucial move is not just splitting. It is making recombination explicit and reviewable.

Start With the God-Object Smell¶

You usually reach this lesson after seeing a domain object that carries every concern at once. The pain is obvious in teams too: every change collides with every other change.

If one type is imported and mutated by every subsystem, the model boundary is already too wide.
If cross-subsystem invariants are checked everywhere, no one place owns the truth.
If splitting the model would make integration rules disappear, the split is not yet disciplined enough.

Core question
How do you split domain concepts into independent subsystem ADTs and recombine them safely — keeping subsystems loosely coupled while the overall model stays coherent, evolvable, and type-safe in every FuncPipe pipeline?

This lesson introduces compositional domain modeling as a boundary design strategy:

give each subsystem its own focused value model
keep cross-subsystem invariants at explicit assemblers or conversion points
let teams and modules evolve locally until they intentionally meet

The motivating giant Chunk example matters because it captures both the code smell and the team smell at the same time: one type, too many reasons to change.

The naïve pattern everyone writes first:

# BEFORE – monolithic, tightly coupled
@dataclass(frozen=True)
class Chunk:
    text: str
    source: str
    tags: list[str]
    embedding_model: str | None
    expected_dim: int | None
    vector: list[float] | None
    # 37 more fields...
# Every subsystem imports and mutates the same type → merge hell

This is the composition problem to name.

The production pattern keeps each subsystem small and then makes integration the one place where cross-cutting rules are enforced deliberately.

# AFTER – split + safe recombination
text = ChunkText(content="...")
meta = ChunkMetadata(source="web", tags=("news",))
emb  = Embedding(vector=(0.1, ...), model="mini", dim=384)

chunk = assemble(text, meta, emb)   # Validation[Chunk, ErrInfo] with cross-checks

That separation is the core value: local evolution plus visible recombination rules.

Use this when you want independent model evolution without turning integration into schema chaos.

Outcome 1. Every domain concept lives in its own subsystem ADT. 2. Recombination via validated assemble / conversion functions. 3. Zero accidental coupling — changes stay local until explicitly integrated.

Tiny Non-Domain Example – Order Processing Split¶

# billing.py
@dataclass(frozen=True, slots=True)
class BillingInfo:
    amount_cents: int
    currency: str

# shipping.py
@dataclass(frozen=True, slots=True)
class ShippingAddress:
    street: str
    country: str

# order.py – integration point only
def create_order(billing: BillingInfo, shipping: ShippingAddress) -> Validation[Order, str]:
    if billing.currency != "USD" and shipping.country != "US":
        return v_failure(("international shipping only for USD",))
    return v_success(Order(billing=billing, shipping=shipping, id=uuid4()))

Billing and Shipping teams work independently. Order team owns the one integration check.

Why Split & Recombine ADTs? (Three bullets every engineer should internalise)¶

Loose coupling: Subsystem A can add fields without touching B — no merge conflicts.
Explicit integration: assemble / conversion functions are the only place cross-cutting concerns live — single source of truth.
Evolvability: Adding a new subsystem (e.g. TaxInfo) only requires updating one integration point, never touching existing ADTs.

DO have 2–4 small, focused ADTs per subsystem.
DO enforce cross-subsystem invariants only in the assembler.
DON’T have one giant “Everything” ADT used everywhere.

By convention, embedding_model and expected_dim set to None mean “no constraint”; assemble only checks them when they are non-None and an embedding is present.

1. Laws & Invariants (machine-checked)¶

Invariant	Description	Enforcement
Subsystem Isolation	Subsystem ADTs only imported by integration layer	Module layout + code review
Recombination Validity	`assemble` enforces cross-subsystem invariants	Validation[Chunk, ErrInfo] return
Exhaustiveness at Join	All subsystem variants handled in integration	`assert_never` in match

2. Decision Table – Split vs Combine¶

Concern	Independent evolution?	Needs cross-checks?	Split?	Combine via
Text processing	Yes	No	Yes	Product
Metadata (source, tags)	Yes	No	Yes	Product
Embedding format	Yes	Yes (dim/model)	Yes	Validated assembler
Search indexing	Yes	Yes (vector norm)	Yes	Conversion function

3. Public API (fp/domain.py – mypy --strict clean)¶

from __future__ import annotations

from dataclasses import dataclass, replace, field
from typing import Callable
from uuid import UUID, uuid4

from .validation import Validation, v_success, v_failure
from .error import ErrInfo, ErrorCode

# Subsystem ADTs – imported here for wiring
# Only the integration layer (this module) should use all three
from .text import ChunkText
from .metadata import ChunkMetadata
from .embedding import Embedding

__all__ = [
    "Chunk", "ChunkId",
    "assemble", "try_set_embedding", "map_metadata_checked",
    "upcast_metadata_v1",
]

ChunkId = UUID

@dataclass(frozen=True, slots=True)
class Chunk:
    id: ChunkId = field(default_factory=uuid4)
    text: ChunkText
    metadata: ChunkMetadata
    embedding: Embedding | None = None

def assemble(
    text: ChunkText,
    meta: ChunkMetadata,
    emb: Embedding | None = None,
) -> Validation[Chunk, ErrInfo]:
    errs = []
    # dedup + stable order tags
    norm_tags = tuple(dict.fromkeys(meta.tags))
    if norm_tags != meta.tags:
        meta = replace(meta, tags=norm_tags)

    if emb is not None:
        if meta.embedding_model is not None and meta.embedding_model != emb.model:
            errs.append(ErrInfo(ErrorCode.EMB_MODEL_MISMATCH, f"{emb.model} != {meta.embedding_model}"))
        if meta.expected_dim is not None and meta.expected_dim != emb.dim:
            errs.append(ErrInfo(ErrorCode.EMB_DIM_MISMATCH, f"{emb.dim} != {meta.expected_dim}"))

    return v_failure(tuple(errs)) if errs else v_success(Chunk(text=text, metadata=meta, embedding=emb))

def try_set_embedding(chunk: Chunk, emb: Embedding | None) -> Validation[Chunk, ErrInfo]:
    return assemble(chunk.text, chunk.metadata, emb)

def map_metadata_checked(
    chunk: Chunk,
    f: Callable[[ChunkMetadata], ChunkMetadata],
) -> Validation[Chunk, ErrInfo]:
    return assemble(chunk.text, f(chunk.metadata), chunk.embedding)

# Versioning example – metadata v1 → current
@dataclass(frozen=True, slots=True)
class ChunkMetadataV1:
    source: str
    tags: list[str]

def upcast_metadata_v1(v1: ChunkMetadataV1) -> ChunkMetadata:
    return ChunkMetadata(source=v1.source, tags=tuple(v1.tags))

4. Reference Implementations (continued)¶

4.1 Subsystem ADTs (split across modules)¶

# text.py
@dataclass(frozen=True, slots=True)
class ChunkText:
    content: str

# metadata.py
@dataclass(frozen=True, slots=True)
class ChunkMetadata:
    source: str
    tags: tuple[str, ...]
    embedding_model: str | None = None
    expected_dim: int | None = None

# embedding.py
@dataclass(frozen=True, slots=True)
class Embedding:
    vector: tuple[float, ...]
    model: str
    dim: int = field(init=False)

    def __post_init__(self) -> None:
        object.__setattr__(self, "dim", len(self.vector))

4.2 RAG Integration – Safe Assembly Pipeline¶

def ingest_chunk(
    text: ChunkText,
    meta: ChunkMetadata,
    emb: Embedding | None,
) -> Validation[Chunk, ErrInfo]:
    return assemble(text, meta, emb)

5. Property-Based Proofs (capstone/tests/test_composition.py)¶

from dataclasses import replace
from hypothesis import given, strategies as st
from funcpipe_rag.fp.domain import assemble
from funcpipe_rag.fp.text import ChunkText
from funcpipe_rag.fp.metadata import ChunkMetadata
from funcpipe_rag.fp.embedding import Embedding
from funcpipe_rag.fp.validation import VSuccess, VFailure
from funcpipe_rag.fp.error import ErrorCode

raw_tags = st.lists(st.text(min_size=1), min_size=0, max_size=15)

@given(
    content=st.text(min_size=1),
    source=st.text(),
    raw_tags=raw_tags,
    model=st.none() | st.text(),
    dim=st.none() | st.integers(min_value=1, max_value=8192),
    vector=st.none() | st.lists(st.floats(allow_nan=False, allow_infinity=False), min_size=1, max_size=10),
)
def test_assemble_success_and_dedup(content, source, raw_tags, model, dim, vector):
    meta = ChunkMetadata(
        source=source,
        tags=tuple(raw_tags),
        embedding_model=model,
        expected_dim=dim,
    )
    emb = Embedding(vector=tuple(vector or ()), model=model or "test") if vector else None

    # Force success by matching dim/model when emb present
    if emb:
        meta = replace(meta, expected_dim=emb.dim, embedding_model=emb.model)

    v = assemble(ChunkText(content=content), meta, emb)
    assert isinstance(v, VSuccess)
    chunk = v.value

    # Round-trip core fields
    assert chunk.text.content == content
    assert chunk.metadata.source == source
    assert chunk.metadata.embedding_model == (emb.model if emb else model)
    assert chunk.metadata.expected_dim == (emb.dim if emb else dim)

    # Embedding round-trip when present
    if emb is not None:
        assert chunk.embedding is not None
        assert chunk.embedding.vector == emb.vector
        assert chunk.embedding.model == emb.model
        assert chunk.embedding.dim == emb.dim
    else:
        assert chunk.embedding is None

    # Tags are deduped and stable order
    expected_tags = tuple(dict.fromkeys(raw_tags))
    assert chunk.metadata.tags == expected_tags

@given(
    content=st.text(min_size=1),
    source=st.text(),
    tags=st.lists(st.text(), min_size=1),
    model=st.text(),
    vector=st.lists(st.floats(allow_nan=False, allow_infinity=False), min_size=1, max_size=10),
)
def test_assemble_model_mismatch_fails(content, source, tags, model, vector):
    meta = ChunkMetadata(source=source, tags=tuple(tags), embedding_model=model + "-wrong")
    emb = Embedding(vector=tuple(vector), model=model)
    v = assemble(ChunkText(content=content), meta, emb)
    assert isinstance(v, VFailure)
    assert any(e.code == ErrorCode.EMB_MODEL_MISMATCH for e in v.errors)

@given(
    content=st.text(min_size=1),
    source=st.text(),
    tags=st.lists(st.text(), min_size=1),
    model=st.text(),
    vector=st.lists(st.floats(allow_nan=False, allow_infinity=False), min_size=1, max_size=10),
)
def test_assemble_dim_mismatch_fails(content, source, tags, model, vector):
    meta = ChunkMetadata(source=source, tags=tuple(tags), embedding_model=model, expected_dim=len(vector) + 1)
    emb = Embedding(vector=tuple(vector), model=model)
    v = assemble(ChunkText(content=content), meta, emb)
    assert isinstance(v, VFailure)
    assert any(e.code == ErrorCode.EMB_DIM_MISMATCH for e in v.errors)

6. Big-O & Allocation Guarantees¶

Operation	Time	Heap	Notes
assemble	O(	tags	)
map_metadata_checked	O(	tags	)

7. Anti-Patterns & Immediate Fixes¶

Anti-Pattern	Symptom	Fix
Monolithic Chunk type	Merge conflicts on every change	Split into subsystem ADTs
Direct field access across modules	Tight coupling	Use `assemble` / conversion functions
Circular imports	Import hell	Keep subsystems independent
Implicit cross-checks	Bugs when fields diverge	Explicit validation in `assemble`

8. Pre-Core Quiz¶

Split for…? → Independent evolution
Combine via…? → Validated assembler
Boundary rule? → Explicit conversion only
Mega-ADT? → Never
Benefit? → Teams work in parallel forever

9. Post-Core Exercise¶

Split one existing monolithic type into 2–3 subsystem ADTs.
Write assemble with cross-checks → return Validation.
Add a v1 → current upcast function for one subsystem.
Verify with property test that round-trip through serialization preserves data.

Continue with: ADT Performance

You now build large systems as loosely coupled subsystem ADTs recombined only at explicit, validated integration points — teams evolve independently, schema conflicts vanish, and the domain model stays coherent forever. The final core gives concrete performance guidance for when (and how) to compromise pure ADTs in hot paths.