Worked Example: Explaining a Local Versus CI Drift¶

Page Maps¶

graph LR
  family["Reproducible Research"]
  program["Deep Dive DVC"]
  section["Execution Environments Reproducible Inputs"]
  page["Worked Example: Explaining a Local Versus CI Drift"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

This example shows how Module 03 fits together when a workflow does something that makes teams uneasy:

the local run and the CI run are both honest, but the result is not exactly the same.

The situation¶

Suppose you run the DVC capstone locally and see one set of metrics.

CI runs the same repository and reports a slightly different metric.

Nobody changed:

the Git commit
the DVC-tracked data
params.yaml

That is exactly the kind of moment where weak environment thinking breaks down.

Step 1: Resist the wrong first conclusion¶

The weak conclusions are:

someone must have changed the data
CI is random nonsense
DVC failed to make the workflow reproducible

Module 03 asks for a slower, better question:

if the explicit data and parameter story still matches, what runtime facts might still be part of the input surface?

Step 2: Confirm the explicit workflow state first¶

You check:

DVC-tracked data identity
params.yaml
the declared workflow path

Those still align.

This matters because it narrows the drift question. The repository is not completely mysterious anymore.

Step 3: Inspect runtime evidence¶

Next you compare environment clues:

local toolchain versions
CI toolchain versions
make platform-report

Now the story becomes clearer:

Python version differs slightly
one numerical dependency version differs as well

This does not automatically explain the metric gap perfectly, but it turns the problem from folklore into a reviewable runtime difference.

Step 4: Classify the drift honestly¶

You now have to decide:

is this an expected amount of conditional determinism
or is it meaningful enough to tighten the environment strategy

That is the real judgment in Module 03.

If the delta is tiny and within declared comparison tolerance, the honest answer may be:

these runs are conditionally deterministic, and the current strategy allows small runtime variation.

If the delta is too large for the workflow's review standards, the answer may instead be:

we need a stronger environment strategy or a stricter canonical executor.

Step 5: Choose the right repair¶

The repair is not automatically "pin everything harder."

Possible honest moves include:

tightening dependency control with better lockfiles
standardizing more of the runtime through containers
treating CI as the canonical proof executor
documenting the acceptable comparison tolerance more clearly

The correct move depends on the workflow's review and release needs.

What DVC contributed here¶

DVC did not solve the environment drift directly.

What it did do was make the rest of the story more explicit:

the data identity was not in doubt
the parameter surface was not in doubt
the stage story was not in doubt

That clarity made the environment explanation possible instead of speculative.

The review note you would want¶

Local and CI runs differed slightly even though the recorded data, parameters, and workflow declaration aligned. Platform evidence showed a runtime version difference, which makes environment drift the strongest current explanation. This does not mean DVC failed; it means the repository has made other state explicit enough for the remaining difference to be diagnosed as runtime-sensitive behavior. The next decision is whether the observed drift fits the workflow's accepted tolerance or whether environment control should be tightened.

That note is much stronger than "CI was weird."

Why this is a mastery example¶

This one story exercises the whole module:

Core 1: environment was treated as input
Core 2: divergence was read through the determinism spectrum
Core 3: DVC's boundary stayed honest
Core 4: environment strategy became the repair question
Core 5: runtime evidence was used instead of superstition