Software Provenance, Drift, and Rebuild Evidence¶

Page Maps¶

graph LR
  family["Reproducible Research"]
  program["Deep Dive Snakemake"]
  section["Software Boundaries Reproducible Rules"]
  page["Software Provenance, Drift, and Rebuild Evidence"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Software boundaries are only half the story.

The other half is proving what changed when the software changes.

That matters because a workflow can become non-reproducible even when the file graph still looks correct.

If a helper script, package function, environment file, or container image changes, the repository needs a credible way to explain:

what changed
which outputs are now questionable
what should be rebuilt
what evidence will travel with the rebuilt results

That is the job of software provenance.

Drift does not begin only with input files¶

Learners often first meet Snakemake through file timestamps and declared dependencies.

That is a useful start, but it is incomplete.

Drift can also come from:

edits to workflow scripts
changes in package code under src/
runtime declaration changes in environment files
container image revisions
tool upgrades that alter behavior without changing rule syntax

This is why software boundaries must stay explicit. You cannot review or rebuild what you cannot name.

Reproducibility needs change evidence¶

A strong repository can answer questions like:

did a software change invalidate published outputs?
which rules are affected by a helper-code edit?
what runtime changed between two runs?
can we explain the software surface that produced this artifact?

Those are not luxury questions. They are the practical questions of trust.

One useful review loop¶

flowchart LR
  change["software change"] --> detect["detect affected rules or outputs"]
  detect --> rebuild["rebuild what is no longer trustworthy"]
  rebuild --> record["record provenance with the new artifact"]
  record --> review["review what changed and why"]

This loop is what turns reproducibility from a slogan into a working practice.

What counts as software evidence¶

Good software evidence often includes:

the git revision or release identifier of the repository
the environment or container declaration used for execution
tool or interpreter versions that materially affect behavior
a provenance artifact that travels with published results

In the capstone, workflow/scripts/provenance.py points toward this idea. A workflow step can emit an artifact that explains the software context of the publication outputs.

That is much stronger than relying on memory or a comment in a pull request.

File freshness is not enough¶

Weak assumption:

if the inputs are unchanged, the outputs are still trustworthy.

That assumption fails when:

a script's transformation logic changes
a library upgrade alters ordering, formatting, or statistical behavior
a package helper fixes a bug that changes output meaning

The files may look fresh while the meaning is stale.

That is drift.

A stronger practice¶

Stronger shape:

keep software surfaces explicit enough that a reviewer can name them
treat code and runtime changes as reasons to re-evaluate output trust
generate provenance artifacts for outputs that will be shared or published
use rebuild-oriented review commands when software changes are suspected

In practice, this is where commands such as --list-changes code or broader provenance checks become valuable. They help the team connect software edits to output risk.

A simple example¶

Imagine this sequence:

src/capstone/reporting.py changes how summary statistics are rounded.
No input files change.
Published tables still exist from the earlier run.

If the team only checks input freshness, those tables may appear valid.

If the team treats software as part of workflow meaning, it asks a better question:

which outputs were produced with the old implementation, and where is the evidence for the new build?

That question is the difference between accidental and intentional reproducibility.

Common failure modes¶

Failure mode	What goes wrong	Better repair
provenance only records input files	software drift stays invisible	record software context for important outputs
helper-code edits are treated as “internal only”	stale publications survive longer than they should	review software edits as output-affecting changes
environment updates are merged without rebuild thinking	runs become incomparable	connect runtime changes to rebuild and release review
publication artifacts omit software identity	external readers cannot trace the producing surface	emit provenance next to important deliverables
teams trust memory over recorded evidence	review becomes anecdotal	prefer generated evidence and versioned declarations

The explanation a reviewer trusts¶

Strong explanation:

this output was rebuilt because the reporting helper changed under src/, and the new publication includes updated provenance that records the runtime and repository state.

Weak explanation:

the code changed a bit, but the files looked current.

The first explanation defends trust. The second confuses absence of file churn with absence of semantic drift.

End-of-page checkpoint¶

Before leaving this page, you should be able to:

explain how software drift differs from input drift
name at least three kinds of software evidence worth keeping
describe why published outputs need provenance that includes software context
explain why unchanged input files do not guarantee trustworthy outputs