Module 06: Experiments, Baselines, and Controlled Change¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive DVC"]
section["Experiments Baselines Controlled Change"]
page["Module 06: Experiments, Baselines, and Controlled Change"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
Module 06 turns reproducibility into a way to explore safely.
Earlier modules made the baseline more trustworthy: data has identity, runtime influence is visible, pipeline edges are declared, and metric meaning is reviewable. That does not mean the team should stop changing the workflow. It means change needs a clean boundary.
This module is about disciplined exploration:
- what makes a baseline authoritative
- what makes an experiment comparable to that baseline
- what DVC experiments help record
- where experiment isolation ends
- how a candidate result becomes a deliberate promotion instead of a lucky local run
The central question is:
What changed, where was it declared, and can this result be compared or promoted without corrupting the baseline story?
If the answer is still "I tried a few things locally," the workflow has left the course's evidence model.
The capstone corroboration surface for this module is the set of files and commands that
make experiment review visible: capstone/params.yaml, capstone/metrics/metrics.json,
capstone/publish/v1/params.yaml, capstone/docs/experiment-guide.md,
capstone/docs/release-review-guide.md, capstone/docs/publish-contract.md, and
the make -C capstone experiment-review route.
Why this module exists¶
Experimentation is where many reproducible workflows become informal again.
Common failure patterns look ordinary:
- changing a threshold locally and forgetting which run produced the better metric
- copying a script to try a new model family
- mixing data changes with parameter changes and calling the result one experiment
- keeping a strong result in the workspace without a promotion review
- promoting a result because it had the best metric, while ignoring comparability limits
Those are not just organization problems. They erase lineage. They make later release and collaboration decisions depend on memory instead of evidence.
The point of Module 06 is not to make exploration slow. The point is to let exploration happen without damaging the baseline that makes comparison possible.
Study route¶
flowchart LR
overview["Overview"] --> core1["Core 1: baseline authority"]
core1 --> core2["Core 2: experiment scope"]
core2 --> core3["Core 3: DVC experiment mechanics"]
core3 --> core4["Core 4: comparison and selection"]
core4 --> core5["Core 5: promotion and cleanup"]
core5 --> example["Worked example"]
example --> practice["Exercises and answers"]
practice --> glossary["Glossary"]
Read the module in that order the first time.
If the problem is already partly clear, use this shortcut:
- open Core 1 when the main confusion is "what is the baseline protecting?"
- open Core 2 when the main confusion is "what belongs in one experiment?"
- open Core 3 when the main confusion is "what do DVC experiments record and isolate?"
- open Core 4 when the main confusion is "which candidate is actually comparable?"
- open Core 5 when the main confusion is "how does a candidate become history safely?"
Module map¶
| Page | Purpose |
|---|---|
| Overview | explains the module promise and study route |
| Baseline Authority and Experiment Intent | teaches how a baseline anchors controlled exploration |
| Experiment Scope and Change Boundaries | teaches which changes belong in one experiment and which require stronger review |
| DVC Experiment Records and Isolation | teaches what DVC experiments record, compare, and keep separate |
| Comparing Experiments and Selecting Candidates | teaches candidate comparison without metric cherry-picking |
| Promotion, Cleanup, and History Integrity | teaches deliberate promotion, discard, and baseline protection |
| Worked Example: Promoting a Controlled Threshold Experiment | walks through one realistic experiment review |
| Exercises | gives five mastery exercises |
| Exercise Answers | explains model answers and review logic |
| Glossary | keeps the module vocabulary stable |
What should be clear by the end¶
By the end of this module, you should be able to explain:
- what makes a baseline authoritative enough for experiment comparison
- how to keep one experiment focused on a reviewable change
- what DVC experiments add beyond ordinary Git branch history
- how to compare candidate runs without ignoring metric meaning
- why promotion requires evidence, not only a better number
- how cleanup protects you from stale local folklore
Commands to keep close¶
These commands form the evidence loop for Module 06:
make -C capstone experiment-review
make -C capstone prediction-review
dvc exp run
dvc exp show
dvc exp diff
dvc exp apply
Use the make routes for the course-provided capstone review. Use the dvc exp commands
inside a DVC workspace when you want to inspect candidate runs directly.
Capstone route¶
Use the capstone after you can explain what the baseline promises.
Best corroboration surfaces for this module:
capstone/params.yamlcapstone/metrics/metrics.jsoncapstone/publish/v1/params.yamlcapstone/publish/v1/metrics.jsoncapstone/docs/experiment-guide.mdcapstone/docs/release-review-guide.mdcapstone/docs/release-review-guide.md
Useful proof route:
The point of that route is not to celebrate variation. It is to ask whether a candidate run changed a declared control, stayed comparable to the baseline, and deserves any promotion discussion at all.