Skip to content

Module 06: Experiments, Baselines, and Controlled Change

Page Maps

graph LR
  family["Reproducible Research"]
  program["Deep Dive DVC"]
  section["Experiments Baselines Controlled Change"]
  page["Module 06: Experiments, Baselines, and Controlled Change"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone
flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Module 06 turns reproducibility into a way to explore safely.

Earlier modules made the baseline more trustworthy: data has identity, runtime influence is visible, pipeline edges are declared, and metric meaning is reviewable. That does not mean the team should stop changing the workflow. It means change needs a clean boundary.

This module is about disciplined exploration:

  • what makes a baseline authoritative
  • what makes an experiment comparable to that baseline
  • what DVC experiments help record
  • where experiment isolation ends
  • how a candidate result becomes a deliberate promotion instead of a lucky local run

The central question is:

What changed, where was it declared, and can this result be compared or promoted without corrupting the baseline story?

If the answer is still "I tried a few things locally," the workflow has left the course's evidence model.

The capstone corroboration surface for this module is the set of files and commands that make experiment review visible: capstone/params.yaml, capstone/metrics/metrics.json, capstone/publish/v1/params.yaml, capstone/docs/experiment-guide.md, capstone/docs/release-review-guide.md, capstone/docs/publish-contract.md, and the make -C capstone experiment-review route.

Why this module exists

Experimentation is where many reproducible workflows become informal again.

Common failure patterns look ordinary:

  • changing a threshold locally and forgetting which run produced the better metric
  • copying a script to try a new model family
  • mixing data changes with parameter changes and calling the result one experiment
  • keeping a strong result in the workspace without a promotion review
  • promoting a result because it had the best metric, while ignoring comparability limits

Those are not just organization problems. They erase lineage. They make later release and collaboration decisions depend on memory instead of evidence.

The point of Module 06 is not to make exploration slow. The point is to let exploration happen without damaging the baseline that makes comparison possible.

Study route

flowchart LR
  overview["Overview"] --> core1["Core 1: baseline authority"]
  core1 --> core2["Core 2: experiment scope"]
  core2 --> core3["Core 3: DVC experiment mechanics"]
  core3 --> core4["Core 4: comparison and selection"]
  core4 --> core5["Core 5: promotion and cleanup"]
  core5 --> example["Worked example"]
  example --> practice["Exercises and answers"]
  practice --> glossary["Glossary"]

Read the module in that order the first time.

If the problem is already partly clear, use this shortcut:

  • open Core 1 when the main confusion is "what is the baseline protecting?"
  • open Core 2 when the main confusion is "what belongs in one experiment?"
  • open Core 3 when the main confusion is "what do DVC experiments record and isolate?"
  • open Core 4 when the main confusion is "which candidate is actually comparable?"
  • open Core 5 when the main confusion is "how does a candidate become history safely?"

Module map

Page Purpose
Overview explains the module promise and study route
Baseline Authority and Experiment Intent teaches how a baseline anchors controlled exploration
Experiment Scope and Change Boundaries teaches which changes belong in one experiment and which require stronger review
DVC Experiment Records and Isolation teaches what DVC experiments record, compare, and keep separate
Comparing Experiments and Selecting Candidates teaches candidate comparison without metric cherry-picking
Promotion, Cleanup, and History Integrity teaches deliberate promotion, discard, and baseline protection
Worked Example: Promoting a Controlled Threshold Experiment walks through one realistic experiment review
Exercises gives five mastery exercises
Exercise Answers explains model answers and review logic
Glossary keeps the module vocabulary stable

What should be clear by the end

By the end of this module, you should be able to explain:

  • what makes a baseline authoritative enough for experiment comparison
  • how to keep one experiment focused on a reviewable change
  • what DVC experiments add beyond ordinary Git branch history
  • how to compare candidate runs without ignoring metric meaning
  • why promotion requires evidence, not only a better number
  • how cleanup protects you from stale local folklore

Commands to keep close

These commands form the evidence loop for Module 06:

make -C capstone experiment-review
make -C capstone prediction-review
dvc exp run
dvc exp show
dvc exp diff
dvc exp apply

Use the make routes for the course-provided capstone review. Use the dvc exp commands inside a DVC workspace when you want to inspect candidate runs directly.

Capstone route

Use the capstone after you can explain what the baseline promises.

Best corroboration surfaces for this module:

  • capstone/params.yaml
  • capstone/metrics/metrics.json
  • capstone/publish/v1/params.yaml
  • capstone/publish/v1/metrics.json
  • capstone/docs/experiment-guide.md
  • capstone/docs/release-review-guide.md
  • capstone/docs/release-review-guide.md

Useful proof route:

make -C capstone experiment-review
make -C capstone release-audit

The point of that route is not to celebrate variation. It is to ask whether a candidate run changed a declared control, stayed comparable to the baseline, and deserves any promotion discussion at all.