Parameters as Comparison Controls¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive DVC"]
section["Metrics Parameters Comparable Meaning"]
page["Parameters as Comparison Controls"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
Parameters are part of the result story.
If a parameter influences training, evaluation, filtering, or publishing, it is not an incidental knob. It is part of the comparison surface.
That means a reviewer should be able to answer:
Which controls changed between these two results, and do those changes still allow the comparison we are making?
DVC helps by tracking declared parameter values. You still have to decide which values deserve that treatment.
Controls that belong in review¶
Good parameter candidates usually affect the meaning of a result:
- model family
- regularization strength
- random seed
- train/test split policy
- feature inclusion toggles
- label filtering thresholds
- evaluation threshold
- minimum population size
- release acceptance tolerance
Example:
fit:
model_family: logistic_regression
regularization: 0.2
random_seed: 20260411
evaluate:
threshold: 0.65
minimum_population_size: 400
When these values are declared in dvc.yaml, DVC can record them in lock evidence and
show changes with dvc params diff.
stages:
evaluate:
cmd: python -m incident_escalation_capstone.evaluate
deps:
- models/escalation-model.json
- data/prepared/incidents.parquet
params:
- evaluate.threshold
- evaluate.minimum_population_size
outs:
- metrics/metrics.json
Now an evaluation threshold change is not a private code edit. It is a declared control change.
Not every constant is a parameter¶
The answer is not "put every value in params.yaml."
Some values are ordinary implementation details:
- a local variable name
- a formatting width
- a retry delay for a non-resulting network call
- a temporary file suffix
- an internal chunk size that does not change semantics or intended comparison
The review question is:
Would changing this value alter the result, the comparison, or the release judgment?
If yes, it probably belongs in the parameter surface. If no, forcing it into
params.yaml may add noise rather than clarity.
Parameter drift changes the comparison¶
Suppose two runs report:
and:
The improvement looks simple until dvc params diff shows:
Now the review question changes. The team is no longer comparing model behavior under the same threshold. It may be comparing a threshold policy change.
That can still be valuable, but the conclusion must be honest:
- "model quality improved under the same controls" is not justified
- "the threshold policy changed and the resulting F1 improved" may be justified
The parameter diff changes what the metric diff means.
Hidden controls are stale-result risk¶
The most dangerous parameter is the influential value that DVC cannot see.
Examples:
- a threshold hard-coded in Python
- a split seed read from an unlisted config file
- an environment variable controlling evaluation behavior
- a default value that changed in a library call
- a command-line flag used in documentation but not declared in
dvc.yaml
Some of these belong directly in params.yaml. Some may belong in environment management
or command declaration. The important point is that influential controls need a review
home.
If a threshold matters to the result, hiding it in code makes every metric comparison harder to defend.
A simple placement rule¶
Use this table as a starting point:
| Value | Usual placement | Reason |
|---|---|---|
| model family | params.yaml and stage params |
changes the model being compared |
| evaluation threshold | params.yaml and stage params |
changes metric interpretation |
| random seed | params.yaml and stage params |
controls repeatability and comparison |
| Python package version | lockfile or environment evidence | runtime control, not a DVC metric parameter by itself |
| plot title | usually code or report configuration | usually presentation, unless release policy depends on it |
| publish acceptance tolerance | params.yaml or release contract |
affects release decision |
The table is not a replacement for judgment. It is a way to name the comparison role of a value before deciding where it lives.
Review checkpoint¶
You understand this core when you can:
- name the controls that affect a metric comparison
- decide whether a value belongs in
params.yaml - explain how
dvc params diffchanges metric interpretation - identify hidden controls that make comparisons weak
- avoid turning
params.yamlinto a junk drawer of harmless constants
Parameter discipline is not bureaucracy. It is how future reviewers learn what changed beside the number.