Environments, Containers, and Runtime Contracts¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive Snakemake"]
section["Software Boundaries Reproducible Rules"]
page["Environments, Containers, and Runtime Contracts"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
Software boundaries are not only about where code lives.
They are also about where code runs.
That is why you eventually need a more serious idea than "use a conda environment when packages get annoying."
An environment or container is a runtime contract.
It says:
- what software this step expects
- what versions matter
- what interpreter or binary family is assumed
- what must stay stable for a rerun to mean the same thing
That is a much better mental model than "dependency helper."
A rule is not reproducible if its runtime is vague¶
Two rules can have identical inputs, outputs, and shell commands and still behave differently if:
- one runs with a different Python version
- a CLI tool resolves to a different binary
- a library upgrade changes parsing or formatting behavior
- system packages differ across machines
This is why software boundaries belong in the course. Reproducibility is not only about declaring files. It is also about declaring the runtime that gives those files meaning.
Three common runtime surfaces¶
In the capstone, you can already see three distinct runtime surfaces:
- rule-scoped environments such as
workflow/envs/python.yaml - repository-level environments such as
environment.yaml - container definitions such as
Dockerfile
These are not interchangeable just because they all mention software.
They answer different questions.
One useful comparison¶
| Runtime surface | Best use | Main promise |
|---|---|---|
| rule-scoped environment | a rule or small group of rules needs a declared toolchain | this step runs with the software it was designed for |
| repository environment | contributors need a shared authoring or development baseline | people can work on the repository without guessing setup |
| container image | a larger execution boundary needs portable system-level consistency | this workflow can move across machines with fewer hidden system assumptions |
This comparison matters because people often try to make one of these surfaces solve every problem.
That usually creates confusion.
The contract should sit close to the step it protects¶
When a rule depends on a particular Python stack, a rule-scoped environment keeps the dependency boundary reviewable.
Conceptually:
rule build_provenance:
input:
"publish/v1/results.tsv"
output:
"publish/v1/provenance.json"
conda:
"workflow/envs/python.yaml"
script:
"workflow/scripts/provenance.py"
This teaches something important.
The rule is saying that the file contract and the runtime contract belong together.
That makes review easier:
- the reviewer sees which step needs the environment
- the environment does not pretend to describe the whole repository
- the script can rely on declared software instead of ambient luck
Containers solve a different class of uncertainty¶
A container earns its keep when system-level assumptions matter:
- OS packages
- shell tools
- compiled dependencies
- execution on infrastructure where host configuration is not trustworthy
That is why a Dockerfile is not just a larger environment file.
It is a stronger boundary.
flowchart TD
rule["Rule contract"] --> runtime["Runtime contract"]
runtime --> env["Rule environment"]
runtime --> container["Container image"]
env --> behavior["Observed behavior"]
container --> behavior
The point of this diagram is not to glorify containers. It is to show that both environments and containers shape behavior.
A weak pattern¶
Weak shape:
- the rule assumes
pythonexists - the script imports libraries that are never declared
- the workflow works on one laptop because that laptop happens to be prepared already
This gives a false sense of simplicity. Nothing is explicit.
When the run fails elsewhere, the repository has not become "unlucky." It has revealed that the runtime contract was missing.
A stronger pattern¶
Stronger shape:
- keep authoring dependencies in the repository environment
- keep rule-specific runtime needs close to the rules that require them
- use a container when the machine boundary matters, not only when installation is hard
This makes the software story legible instead of magical.
How to choose without overengineering¶
Ask these questions:
- Is the uncertainty mostly a library or interpreter concern for one step?
- Is the uncertainty about system-level behavior across machines?
- Is this software boundary needed for authors, runners, or both?
If the answer is mostly "one step needs a declared toolchain," start with a rule-scoped environment.
If the answer is "this must behave the same across machine families," a container may be the right boundary.
If the answer is "contributors need a stable place to edit, lint, and test the repo," that is a repository environment concern.
Common failure modes¶
| Failure mode | Why it hurts | Better repair |
|---|---|---|
| one huge environment for every rule | tiny changes force unrelated runtime churn | split runtime contracts by actual ownership |
| rules rely on undeclared host tools | success depends on machine history | declare the tools in an environment or container |
| container used with no reason | review becomes heavier without better guarantees | use containers for real machine-boundary problems |
| environment files copied without explanation | readers memorize files instead of intent | explain which step or boundary the file protects |
| repository environment treated as workflow proof | runtime drift remains hidden at rule level | keep rule execution contracts explicit where needed |
The explanation a reviewer trusts¶
Strong explanation:
this rule uses
workflow/envs/python.yamlbecause the provenance script has a specific Python dependency boundary; the repository-level environment remains for authoring, and the container is reserved for machine-level portability.
Weak explanation:
we added more environment files because reproducibility is important.
The first explanation describes ownership. The second only gestures at virtue.
End-of-page checkpoint¶
Before leaving this page, you should be able to:
- explain why runtime declarations belong to reproducibility, not only setup convenience
- distinguish rule-scoped environments from repository environments
- explain when a container solves a stronger problem than an environment file
- describe why the runtime contract should stay close to the rule it protects