Reviewing Environment Drift and Runtime Evidence¶

Page Maps¶

graph LR
  family["Reproducible Research"]
  program["Deep Dive DVC"]
  section["Execution Environments Reproducible Inputs"]
  page["Reviewing Environment Drift and Runtime Evidence"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Environment drift is easy to talk about loosely and hard to review honestly.

This page is about turning it into a sequence of concrete questions.

The main review question¶

When a run differs unexpectedly, ask:

what evidence suggests this is an environment difference rather than data, parameter, or stage drift?

That question matters because "it must be the environment" is only slightly better than guessing.

A practical drift ladder¶

flowchart TD
  symptom["observe a difference"] --> confirm["confirm code, data, and params are aligned"]
  confirm --> env["inspect environment evidence"]
  env --> classify["classify expected variance versus real drift"]
  classify --> action["decide: tolerate, tighten, or escalate"]

This ladder keeps the diagnosis from collapsing into folklore.

Start with what you can rule out¶

Before blaming the environment, confirm:

the same tracked data was used
the same declared parameters were used
the same recorded workflow route applies

This is exactly where DVC's explicit state helps. It narrows the search space.

Then inspect runtime evidence¶

For Module 03, useful environment evidence includes:

tool and interpreter versions
local versus CI platform reports
declared install surface
any documented environment strategy such as lockfiles or containers

The capstone's make platform-report is helpful because it gives you a concrete example of environment evidence that is small, reviewable, and relevant.

Distinguish three kinds of findings¶

Finding	Meaning
expected variance	the workflow is only conditionally deterministic here, and the difference is within declared tolerance
environment drift	a meaningful runtime change likely affected the result
deeper workflow issue	the environment was blamed too early, and the real problem lives elsewhere

This distinction is what keeps the review disciplined.

A small example¶

Suppose local and CI metrics differ slightly.

A calm review might say:

data identity matches
params.yaml matches
pipeline declaration matches
platform-report shows a Python or DVC version difference
the metric delta is small enough to treat as conditional determinism, or large enough to escalate

That is much stronger than:

CI is flaky again.

Why tolerances need honesty¶

Not every difference deserves panic.

But tolerance needs to be declared, not assumed retroactively after every surprise.

Strong teams can say:

this amount of drift is expected under our current environment strategy
this amount of drift is not acceptable for release or comparison

That is what turns runtime variability into something governable.

When to escalate¶

Escalate when:

the drift is too large to fit the workflow's declared tolerance
a release or comparison claim depends on stronger sameness
the team does not yet have enough environment evidence to explain the difference

Escalation here does not mean panic. It means the current evidence is no longer enough.

Keep this standard¶

Do not let "environment issue" become a polite way of saying "we do not know."

Ask for:

the evidence
the ruled-out alternatives
the expected tolerance
the next action

That is the review discipline Module 03 is trying to build.