Convergence, Rerun Causes, and Hidden Inputs¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive Snakemake"]
section["File Contracts Workflow Graph Truth"]
page["Convergence, Rerun Causes, and Hidden Inputs"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
This page explains why a workflow settles down after a clean run, why some workflows rerun forever, and why others silently miss necessary reruns.
The sentence to keep¶
When a workflow reruns unexpectedly, ask:
what tracked fact changed, or what untracked fact should have changed but the workflow never admitted it?
That question covers both major failure classes:
- too many reruns
- not enough reruns
What convergence means¶
A small truthful workflow should settle down after it has built its requested outputs.
In practice, the simplest convergence check is:
After a clean successful run, the dry-run should effectively say:
That does not mean a workflow will never rerun again. It means:
- the current outputs match the current tracked workflow state
- Snakemake has no reason to schedule more jobs right now
Convergence is one of the most useful signs that a beginner workflow is telling the truth.
Reruns are not automatically bad¶
People often treat reruns as evidence that Snakemake is being difficult.
That is backwards.
Reruns are good when something meaningful changed:
- an input file changed
- a tracked parameter changed
- rule code changed
- an output is incomplete or missing
What feels bad is unexplained reruns or missing reruns.
Those point to a contract problem.
The two big failure modes¶
Most Module 01 rerun problems fall into two groups:
| Failure mode | What it looks like |
|---|---|
| non-convergence | the workflow plans jobs again even though you expected it to be settled |
| silent staleness | the workflow claims everything is up to date even though a real influence changed |
The first is noisy. The second is more dangerous because it looks calm while being wrong.
Why non-convergence happens¶
The most common beginner causes are:
- unstable parameters
- code that changes meaning every run
- outputs left incomplete or poisoned by failure
- environment-dependent behavior hidden from workflow logic
The shared theme is simple:
the workflow does not present a stable story about what makes an output valid.
A minimal non-converging example¶
Consider this rule:
import time
rule all:
input:
"results/nondet.txt"
rule nondet:
output:
"results/nondet.txt"
params:
salt=lambda: time.time()
shell:
r"""
mkdir -p results
echo "{params.salt}" > {output}
"""
The command is not the deep problem here. The problem is that the rule meaning changes every time the Snakefile is evaluated.
If you run:
you should expect the second command to plan nondet again instead of reporting
convergence.
That is not mysterious. The rule's tracked state is unstable.
Why silent staleness happens¶
Now consider the opposite defect:
MODE = os.environ.get("MODE", "default")
rule render:
input:
"data/input.txt"
output:
"results/rendered.txt"
shell:
r"""
./scripts/render.sh --mode "{MODE}" {input} > {output}
"""
If MODE changes the meaning of the output but the workflow treats it as casual hidden
state, you get a different problem:
- the output meaning changed
- the workflow may not explain the change clearly
- another person can get different results without understanding why
Whether the exact rerun behavior changes with the version and setup is less important than the design lesson:
if something meaningfully affects the output, it should be represented in an intentional, inspectable way.
Hidden inputs are still inputs¶
People often think of inputs as only the paths listed under input:.
That is too narrow.
A hidden input can be:
- an environment variable
- a threshold embedded in Python code
- a sidecar file a shell script reads without declaring
- a catalog or manifest discovered implicitly
- a tool behavior change that the workflow never surfaces
If that thing changes output meaning, the workflow has to decide how it will be represented and audited.
Module 01 does not require solving every advanced provenance problem. It does require honesty.
A better beginner habit¶
When an output depends on a value, choose one of these honest approaches:
- move the value into
config.yaml - generate a manifest file and declare it as an input
- pin the behavior in explicit rule code and accept that code changes are part of workflow meaning
What you should not do is let important meaning float in ambient state that nobody reviews.
A simple example with config¶
Instead of this:
rule summarize:
output:
"results/summary.txt"
shell:
r"""
./scripts/summarize.sh --threshold "$THRESHOLD" > {output}
"""
prefer something like this:
configfile: "config/config.yaml"
THRESHOLD = config["threshold"]
rule summarize:
output:
"results/summary.txt"
shell:
r"""
./scripts/summarize.sh --threshold "{THRESHOLD}" > {output}
"""
Now you can point to one visible semantic source of truth.
The improvement is not just technical. It improves clarity:
- a reader can find the meaning
- a reviewer can reason about it
- a rerun can be explained
Incomplete outputs break trust¶
Another common non-convergence cause is partial or incomplete outputs after failure.
Suppose a rule writes a final output path and then crashes:
rule poison:
output:
"results/poison.txt"
shell:
r"""
mkdir -p results
echo "partial" > {output}
exit 1
"""
Now the workflow has a dangerous artifact:
- the final path exists
- the contents are not trustworthy
- later runs have to recover from a state that looks partly complete
This is one reason atomic publication matters so much. We return to it in Core 5, but it is already part of rerun truth.
The evidence loop for rerun questions¶
When rerun behavior feels wrong, use an evidence loop instead of guessing:
The first tells you whether jobs would run.
The summary views help answer:
- which outputs exist
- which rules own them
- which outputs are missing or incomplete
If your explanation for a rerun cannot be connected back to one of those views, it is probably still hand-wavy.
A practical review table¶
| Symptom | Likely class | First review question |
|---|---|---|
| reruns every dry-run | unstable tracked meaning | which param, code path, or output state changes every run |
| no rerun after a meaningful change | hidden input | what influenced the output but is not represented clearly |
| reruns after failure | incomplete or poisoned output | did a final path get published before the work was truly done |
| different behavior on different machines | ambient environment drift | which semantic choice is leaking in from host state |
This table keeps the diagnosis grounded.
A strong explanation sounds like this¶
Weak explanation:
Snakemake keeps rerunning and I do not know why.
Stronger explanation:
The workflow does not converge because the rule's
paramsvalue changes at parse time on every run, so Snakemake keeps seeing a new tracked state forresults/nondet.txt.
Or:
The workflow falsely appears up to date because the threshold comes from an undeclared environment variable, so a meaningful output influence is not represented as config or as another tracked input.
Those explanations teach much more than rerunning the command again.
Failure signatures worth recognizing¶
"The workflow is successful, but snakemake -n still wants to run jobs"¶
That usually means the workflow meaning is unstable or the outputs are not truly complete.
"Changing an important option did not trigger any meaningful rebuild"¶
That usually means the option was a hidden semantic input.
"The result differs between machines, but the Snakefile looks the same"¶
That often means ambient environment or undeclared external state is leaking into the workflow.
"A failed run leaves behind files that make the next run confusing"¶
That is a publication-truth problem as much as a failure-handling problem.
What this page wants you to remember¶
Convergence is not a cosmetic check. It is evidence that the workflow currently tells a stable truth.
If a workflow reruns forever, ask which tracked meaning keeps changing.
If it never reruns when it should, ask which real influence the workflow never admitted.
That habit will save you from a lot of mystery later in the course.