Exercise Answers¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive DVC"]
section["Experiments Baselines Controlled Change"]
page["Exercise Answers"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
These answers are model explanations, not the only acceptable wording.
What matters is whether the reasoning keeps baseline authority, candidate scope, evidence, and promotion decisions connected.
Answer 1: Name the baseline¶
Strong baseline description:
The baseline uses
evaluate.threshold: 0.65and reports F10.81, precision0.78, and recall0.84. Candidate experiments should compare against the same declared parameter surface, metric definition, and evaluation population unless a baseline boundary change is explicitly reviewed.
Evidence to inspect:
params.yamlmetrics/metrics.jsondvc.lock- published baseline files such as
publish/v1/params.yamlandpublish/v1/metrics.json - any release or prediction review guide that explains the metric meaning
The main lesson is that baseline is evidence, not memory.
Answer 2: Scope a candidate¶
This should usually be separate candidate runs.
Reasoning:
- lowering
evaluate.thresholdtests a policy tradeoff - switching
fit.model_familytests model configuration - removing weekends changes the evaluation population and may require baseline boundary review
Combining all three would make the result hard to interpret. If the metric improves, the team would not know whether the cause was threshold, model family, population change, or their interaction.
A defensible combined run is possible only if the intent is explicitly a full policy proposal. It should not be reviewed as a clean threshold or model experiment.
Answer 3: Interpret a candidate table¶
Strong review note:
The candidate changes
evaluate.thresholdfrom0.65to0.50. F1 improves from0.81to0.84, and recall improves from0.84to0.95. Precision decreases from0.78to0.75. This candidate is promising only if the release objective prioritizes reducing missed escalations enough to accept the precision cost.
The main lesson is to describe the tradeoff instead of naming only the higher F1.
Answer 4: Identify what DVC experiments do not prove¶
Strong response:
DVC experiments help record candidate runs, parameter differences, metrics, and comparison evidence without immediately rewriting main Git history. They do not prove that the candidate is semantically comparable, scientifically valid, environmentally stable, or appropriate for release. We still need to inspect the baseline, parameter changes, metric definitions, evaluation population, and release objective before calling a candidate valid.
The main lesson is that DVC preserves evidence; people still review meaning.
Answer 5: Decide promotion or discard¶
Before applying the candidate, inspect:
- parameter diff
- metric diff
- baseline identity
- evaluation population evidence
- metric schema stability
- whether the candidate has unrelated changes
After applying the candidate, check:
git diffgit status- expected
params.yaml, metric, lock, or output changes - no unrelated workspace changes
- review route output if the course capstone provides one
Strong promotion note should include:
- threshold changed from
0.65to0.50 - recall improved
- precision decreased
- population and metric meaning remained comparable, if verified
- release objective prioritizes missed-escalation reduction
- promotion is a recall-oriented threshold policy decision, not a pure model improvement claim
The main lesson is that applying is not promotion until the intended state is reviewed and committed.
Self-check¶
If your answers consistently explain:
- what baseline state anchors the comparison
- what each candidate is trying to learn
- what DVC experiment evidence can and cannot prove
- which tradeoffs matter for selection
- why promotion needs applying, inspecting, and committing a defensible state
then you are using Module 06 correctly.