Reviewing Environment Drift and Runtime Evidence¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive DVC"]
section["Execution Environments Reproducible Inputs"]
page["Reviewing Environment Drift and Runtime Evidence"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
Environment drift is easy to talk about loosely and hard to review honestly.
This page is about turning it into a sequence of concrete questions.
The main review question¶
When a run differs unexpectedly, ask:
what evidence suggests this is an environment difference rather than data, parameter, or stage drift?
That question matters because "it must be the environment" is only slightly better than guessing.
A practical drift ladder¶
flowchart TD
symptom["observe a difference"] --> confirm["confirm code, data, and params are aligned"]
confirm --> env["inspect environment evidence"]
env --> classify["classify expected variance versus real drift"]
classify --> action["decide: tolerate, tighten, or escalate"]
This ladder keeps the diagnosis from collapsing into folklore.
Start with what you can rule out¶
Before blaming the environment, confirm:
- the same tracked data was used
- the same declared parameters were used
- the same recorded workflow route applies
This is exactly where DVC's explicit state helps. It narrows the search space.
Then inspect runtime evidence¶
For Module 03, useful environment evidence includes:
- tool and interpreter versions
- local versus CI platform reports
- declared install surface
- any documented environment strategy such as lockfiles or containers
The capstone's make platform-report is helpful because it gives you a concrete example of
environment evidence that is small, reviewable, and relevant.
Distinguish three kinds of findings¶
| Finding | Meaning |
|---|---|
| expected variance | the workflow is only conditionally deterministic here, and the difference is within declared tolerance |
| environment drift | a meaningful runtime change likely affected the result |
| deeper workflow issue | the environment was blamed too early, and the real problem lives elsewhere |
This distinction is what keeps the review disciplined.
A small example¶
Suppose local and CI metrics differ slightly.
A calm review might say:
- data identity matches
params.yamlmatches- pipeline declaration matches
- platform-report shows a Python or DVC version difference
- the metric delta is small enough to treat as conditional determinism, or large enough to escalate
That is much stronger than:
CI is flaky again.
Why tolerances need honesty¶
Not every difference deserves panic.
But tolerance needs to be declared, not assumed retroactively after every surprise.
Strong teams can say:
- this amount of drift is expected under our current environment strategy
- this amount of drift is not acceptable for release or comparison
That is what turns runtime variability into something governable.
When to escalate¶
Escalate when:
- the drift is too large to fit the workflow's declared tolerance
- a release or comparison claim depends on stronger sameness
- the team does not yet have enough environment evidence to explain the difference
Escalation here does not mean panic. It means the current evidence is no longer enough.
Keep this standard¶
Do not let "environment issue" become a polite way of saying "we do not know."
Ask for:
- the evidence
- the ruled-out alternatives
- the expected tolerance
- the next action
That is the review discipline Module 03 is trying to build.