Stage Contract Guide¶
Guide Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive DVC"]
guide["Capstone docs"]
section["Docs"]
page["Stage Contract Guide"]
proof["Proof route"]
family --> program --> guide --> section --> page
page -.checks against.-> proof
flowchart LR
orient["Read the guide boundary"] --> inspect["Inspect the named files, targets, or artifacts"]
inspect --> run["Run the confirm, demo, selftest, or proof command"]
run --> compare["Compare output with the stated contract"]
compare --> review["Return to the course claim with evidence"]
Use this guide when dvc.yaml feels mechanically readable but the stage boundaries still
feel implicit. The goal is to make each stage promise explicit enough that you can name
what belongs to that edge and what does not.
Stage promises¶
| Stage | What it owns | What it should not own |
|---|---|---|
prepare |
row normalization, deterministic split, and dataset profile | model training, evaluation metrics, or publish decisions |
fit |
reference model training from the prepared train split | data splitting, evaluation thresholding, or publish packaging |
evaluate |
prediction generation and metric calculation on the eval split | training updates or release-boundary packaging |
publish |
downstream review bundle construction and manifest writing | hidden retraining or recomputing evaluation facts |
Best file route by stage¶
dvc.yamlto see the declared contractdvc.lockto see the recorded execution state- one owning implementation file:
prepare.pyfit.pyevaluate.pypublish.py
Use make stage-summary when you want the repository to render that declared-versus-recorded comparison before you read the raw YAML yourself.
Best review questions¶
- Which params are declared at this stage instead of borrowed implicitly?
- Which outputs become trustworthy enough for the next stage?
- Which facts should be reviewed in
dvc.lockafter execution? - Which promoted facts should still wait until
publish/v1/?
When the stage contract is clear but the next design question is where a new requirement should be placed, keep the same route and ask two questions explicitly:
- which declared input, recorded state surface, or promoted artifact would change
- which owning stage can absorb that change without stealing another stage's job
Best companion guides¶
- read ARCHITECTURE.md when the next question is file ownership above the stage level
- read RECOVERY_GUIDE.md when the next question is which state is authoritative after local loss and restore
- read PUBLISH_CONTRACT.md when the question moves from stage truth to downstream trust