Exercise Answers¶

Page Maps¶

graph LR
  family["Reproducible Research"]
  program["Deep Dive DVC"]
  section["Metrics Parameters Comparable Meaning"]
  page["Exercise Answers"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

These answers are model explanations, not the only acceptable wording.

What matters is whether the reasoning keeps metric meaning, parameter controls, and comparison evidence connected.

Answer 1: Explain the metric claim¶

The metric appears to claim:

the incident escalation workflow produced a positive-class F1 value of 0.82
the metric was computed at a fixed threshold
the evaluation population contained 420 records

Additional meaning a reviewer still needs:

which population those 420 records represent
which class is the positive class
what fixed threshold was used
whether the F1 definition is binary, macro, weighted, or something else
which model and parameter values produced the metric

Why population size is useful but not enough:

it helps catch obvious population movement
it does not prove the same records, label rules, or slice definitions were used

The main lesson is that the metric file gives handles for review, not complete meaning by itself.

Answer 2: Classify parameter controls¶

Likely comparison controls:

fit.model_family, because it changes the model being compared
fit.random_seed, because it can affect repeatability and learned output
evaluate.threshold, because it changes metric interpretation
evaluate.minimum_population_size, because it affects whether evaluation is valid

Probably not part of the comparison surface:

plot.title, unless release policy or downstream automation depends on it
tmp.file_suffix, because it sounds like temporary implementation plumbing

The main lesson is to ask whether a value changes the result, comparison, or release judgment. Important controls belong in review; harmless plumbing should not turn params.yaml into noise.

Answer 3: Diagnose schema drift¶

Strong review note:

This is a meaning-changing schema change, not a simple improvement. The previous metric was positive-class F1 at a fixed threshold. The new metric is macro F1 after threshold search. The new value may be useful, but it should not be read as an increase from 0.81 to 0.84 for the same metric. The review should either keep the old metric for continuity or clearly mark this as a new comparison contract.

This is not merely additive because the old key disappeared and the metric definition changed.

Answer 4: Interpret metric and parameter diffs¶

Strong interpretation:

Fixed-threshold F1 increased from 0.81 to 0.84, but the evaluation threshold changed from 0.65 to 0.50. That means the comparison is not a same-threshold model improvement claim. It may support a threshold-policy review or a combined model-control comparison, but the conclusion must state that the control surface changed.

The main lesson is that parameter diff changes what metric diff can mean.

Answer 5: Review a plot for release evidence¶

Evidence to check before trusting the plot:

the same evaluation population or a clearly documented population change
the same aggregation or binning rule
deterministic sorting and rendering choices
whether the plot supports or complicates the scalar metric movement
whether the plot speaks to the actual release decision

Responsible release sentence:

The calibration plot uses the same evaluation population and binning rule as the prior release, and it supports the fixed-threshold metric movement without replacing the precision-recall tradeoff review.

The main lesson is that a plot should support a bounded claim. It should not be treated as visual authority by itself.

Self-check¶

If your answers consistently explain:

what a metric claims
which controls change the comparison
whether schema movement preserves meaning
what DVC diffs show without proving
how plots and release notes should bound interpretation

then you are using Module 05 correctly.