Logs, Benchmarks, Summaries, and Provenance¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive Snakemake"]
section["Performance Observability Incident Response"]
page["Logs, Benchmarks, Summaries, and Provenance"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
Observability gets noisy when every artifact is expected to answer every question.
That is not more evidence. It is less clarity.
Snakemake already gives you strong evidence surfaces, but each one has a different job:
- logs explain what one rule said while it ran
- benchmarks explain what one rule cost
- summaries explain workflow state across outputs
- provenance explains who and what produced the run
If you keep those jobs separate, incidents become easier to explain.
The main evidence surfaces¶
| Surface | Best question it answers | Weak question for this surface |
|---|---|---|
dry-run with -n -p |
what Snakemake plans to do and why | how long the tools will take |
per-rule log: files |
what happened inside a specific job | whether the whole workflow was efficient |
per-rule benchmark: files |
how expensive one job or rule family was | why the DAG became larger |
snakemake --summary |
which outputs exist, are pending, or were rebuilt | why a tool failed internally |
snakemake --list-changes input code params |
which tracked change class caused reruns | whether the rerun was expensive |
publish/v1/provenance.json |
which configuration and runtime identity produced the publish bundle | the exact text of a failing tool command |
You do not need every surface for every review. You need the smallest honest one.
A clean way to think about them¶
flowchart TD
question["review question"] --> choose["choose the narrowest evidence surface"]
choose --> dryrun["dry-run and summaries"]
choose --> logs["rule logs"]
choose --> bench["benchmarks"]
choose --> prov["provenance"]
dryrun --> decision["decide whether more evidence is needed"]
logs --> decision
bench --> decision
prov --> decision
This prevents two common mistakes:
- opening huge logs before confirming the workflow even planned the expected jobs
- quoting benchmark numbers before checking whether the rerun was legitimate
Logs: tell the local execution story¶
Good rule logs answer questions such as:
- which sample or target failed
- which command or script branch ran
- whether the tool waited on data, retried internally, or exited cleanly
Good logs are specific. They name the rule-local situation.
Weak logs only print generic progress lines or dump unrelated environment detail that no reviewer asked for.
Benchmarks: measure the rule, not the mood¶
Use benchmark: when you want a durable answer to:
- which rule family is expensive
- whether a rule changed its runtime profile after an edit
- whether tiny jobs are being overwhelmed by launch overhead
Benchmarks become especially useful when a team keeps arguing from memory:
"I think trimming got slower."
That claim should move quickly toward benchmark evidence, not toward a thread-count edit.
Summaries: keep workflow state visible¶
snakemake --summary is a workflow-state surface, not a debugging diary.
Use it to answer:
- which outputs already exist
- which ones are planned for rebuild
- whether the workflow state matches what the incident report claims
This is often the fastest way to discover that a "performance problem" is actually a surprise rerun caused by changed inputs or changed code.
Change reports: explain why Snakemake wants to rerun¶
snakemake --list-changes input code params helps when reviewers need to know which change
class triggered work.
That matters because the next review question changes with the answer:
- input change points toward upstream data movement or discovery
- code change points toward scripts, wrappers, or rule logic
- parameter change points toward config or policy review
Without that distinction, teams often talk about churn without naming its source.
Provenance: tie the run to its identity¶
Workflow evidence is incomplete if you cannot answer:
- which config values were material
- which profile or operating context shaped the run
- which environment or tool identity produced the published result
That is the job of provenance.
Provenance is not a replacement for logs or benchmarks. It is the identity surface that lets you compare runs honestly when the repository alone does not explain the difference.
A small example¶
Suppose a reviewer says:
The report looks different and the workflow took longer.
A good evidence route might be:
snakemake --summaryto confirm what rebuiltsnakemake --list-changes input code paramsto classify the rerun cause- one matching benchmark file for the slowest rule family
- one matching rule log for the suspicious target
publish/v1/provenance.jsonif the change still cannot be explained
That route is short, and every step has a reason.
What good observability looks like¶
Good observability is:
- rule-local when debugging one failure
- workflow-level when explaining changed state
- stable enough that teams can compare runs over time
- cheap enough that nobody removes it as soon as pressure rises
Bad observability usually looks like one of these:
- giant shared logs with no rule ownership
- benchmark files nobody reads or reviews
- summaries collected but never compared to a claim
- provenance missing when context differences are the real suspect
Keep this standard¶
Before adding a new evidence surface, finish this sentence:
We need this artifact because it answers this review question better than the artifacts we already have.
If you cannot finish that sentence, the workflow probably needs a clearer reading route more than it needs another file.