Parameters as Comparison Controls¶

Page Maps¶

graph LR
  family["Reproducible Research"]
  program["Deep Dive DVC"]
  section["Metrics Parameters Comparable Meaning"]
  page["Parameters as Comparison Controls"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Parameters are part of the result story.

If a parameter influences training, evaluation, filtering, or publishing, it is not an incidental knob. It is part of the comparison surface.

That means a reviewer should be able to answer:

Which controls changed between these two results, and do those changes still allow the comparison we are making?

DVC helps by tracking declared parameter values. You still have to decide which values deserve that treatment.

Controls that belong in review¶

Good parameter candidates usually affect the meaning of a result:

model family
regularization strength
random seed
train/test split policy
feature inclusion toggles
label filtering thresholds
evaluation threshold
minimum population size
release acceptance tolerance

Example:

fit:
  model_family: logistic_regression
  regularization: 0.2
  random_seed: 20260411
evaluate:
  threshold: 0.65
  minimum_population_size: 400

When these values are declared in dvc.yaml, DVC can record them in lock evidence and show changes with dvc params diff.

stages:
  evaluate:
    cmd: python -m incident_escalation_capstone.evaluate
    deps:
      - models/escalation-model.json
      - data/prepared/incidents.parquet
    params:
      - evaluate.threshold
      - evaluate.minimum_population_size
    outs:
      - metrics/metrics.json

Now an evaluation threshold change is not a private code edit. It is a declared control change.

Not every constant is a parameter¶

The answer is not "put every value in params.yaml."

Some values are ordinary implementation details:

a local variable name
a formatting width
a retry delay for a non-resulting network call
a temporary file suffix
an internal chunk size that does not change semantics or intended comparison

The review question is:

Would changing this value alter the result, the comparison, or the release judgment?

If yes, it probably belongs in the parameter surface. If no, forcing it into params.yaml may add noise rather than clarity.

Parameter drift changes the comparison¶

Suppose two runs report:

{
  "positive_class_f1_at_fixed_threshold": 0.81
}

and:

{
  "positive_class_f1_at_fixed_threshold": 0.84
}

The improvement looks simple until dvc params diff shows:

Path                  Old    New
evaluate.threshold    0.65   0.50

Now the review question changes. The team is no longer comparing model behavior under the same threshold. It may be comparing a threshold policy change.

That can still be valuable, but the conclusion must be honest:

"model quality improved under the same controls" is not justified
"the threshold policy changed and the resulting F1 improved" may be justified

The parameter diff changes what the metric diff means.

Hidden controls are stale-result risk¶

The most dangerous parameter is the influential value that DVC cannot see.

Examples:

a threshold hard-coded in Python
a split seed read from an unlisted config file
an environment variable controlling evaluation behavior
a default value that changed in a library call
a command-line flag used in documentation but not declared in dvc.yaml

Some of these belong directly in params.yaml. Some may belong in environment management or command declaration. The important point is that influential controls need a review home.

If a threshold matters to the result, hiding it in code makes every metric comparison harder to defend.

A simple placement rule¶

Use this table as a starting point:

Value	Usual placement	Reason
model family	`params.yaml` and stage `params`	changes the model being compared
evaluation threshold	`params.yaml` and stage `params`	changes metric interpretation
random seed	`params.yaml` and stage `params`	controls repeatability and comparison
Python package version	lockfile or environment evidence	runtime control, not a DVC metric parameter by itself
plot title	usually code or report configuration	usually presentation, unless release policy depends on it
publish acceptance tolerance	`params.yaml` or release contract	affects release decision

The table is not a replacement for judgment. It is a way to name the comparison role of a value before deciding where it lives.

Review checkpoint¶

You understand this core when you can:

name the controls that affect a metric comparison
decide whether a value belongs in params.yaml
explain how dvc params diff changes metric interpretation
identify hidden controls that make comparisons weak
avoid turning params.yaml into a junk drawer of harmless constants

Parameter discipline is not bureaucracy. It is how future reviewers learn what changed beside the number.