Exercise Answers¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive DVC"]
section["Metrics Parameters Comparable Meaning"]
page["Exercise Answers"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
These answers are model explanations, not the only acceptable wording.
What matters is whether the reasoning keeps metric meaning, parameter controls, and comparison evidence connected.
Answer 1: Explain the metric claim¶
The metric appears to claim:
- the incident escalation workflow produced a positive-class F1 value of
0.82 - the metric was computed at a fixed threshold
- the evaluation population contained
420records
Additional meaning a reviewer still needs:
- which population those 420 records represent
- which class is the positive class
- what fixed threshold was used
- whether the F1 definition is binary, macro, weighted, or something else
- which model and parameter values produced the metric
Why population size is useful but not enough:
- it helps catch obvious population movement
- it does not prove the same records, label rules, or slice definitions were used
The main lesson is that the metric file gives handles for review, not complete meaning by itself.
Answer 2: Classify parameter controls¶
Likely comparison controls:
fit.model_family, because it changes the model being comparedfit.random_seed, because it can affect repeatability and learned outputevaluate.threshold, because it changes metric interpretationevaluate.minimum_population_size, because it affects whether evaluation is valid
Probably not part of the comparison surface:
plot.title, unless release policy or downstream automation depends on ittmp.file_suffix, because it sounds like temporary implementation plumbing
The main lesson is to ask whether a value changes the result, comparison, or release
judgment. Important controls belong in review; harmless plumbing should not turn
params.yaml into noise.
Answer 3: Diagnose schema drift¶
Strong review note:
This is a meaning-changing schema change, not a simple improvement. The previous metric was positive-class F1 at a fixed threshold. The new metric is macro F1 after threshold search. The new value may be useful, but it should not be read as an increase from
0.81to0.84for the same metric. The review should either keep the old metric for continuity or clearly mark this as a new comparison contract.
This is not merely additive because the old key disappeared and the metric definition changed.
Answer 4: Interpret metric and parameter diffs¶
Strong interpretation:
Fixed-threshold F1 increased from
0.81to0.84, but the evaluation threshold changed from0.65to0.50. That means the comparison is not a same-threshold model improvement claim. It may support a threshold-policy review or a combined model-control comparison, but the conclusion must state that the control surface changed.
The main lesson is that parameter diff changes what metric diff can mean.
Answer 5: Review a plot for release evidence¶
Evidence to check before trusting the plot:
- the same evaluation population or a clearly documented population change
- the same aggregation or binning rule
- deterministic sorting and rendering choices
- whether the plot supports or complicates the scalar metric movement
- whether the plot speaks to the actual release decision
Responsible release sentence:
The calibration plot uses the same evaluation population and binning rule as the prior release, and it supports the fixed-threshold metric movement without replacing the precision-recall tradeoff review.
The main lesson is that a plot should support a bounded claim. It should not be treated as visual authority by itself.
Self-check¶
If your answers consistently explain:
- what a metric claims
- which controls change the comparison
- whether schema movement preserves meaning
- what DVC diffs show without proving
- how plots and release notes should bound interpretation
then you are using Module 05 correctly.