Logs, Benchmarks, Summaries, and Provenance¶

Page Maps¶

graph LR
  family["Reproducible Research"]
  program["Deep Dive Snakemake"]
  section["Performance Observability Incident Response"]
  page["Logs, Benchmarks, Summaries, and Provenance"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Observability gets noisy when every artifact is expected to answer every question.

That is not more evidence. It is less clarity.

Snakemake already gives you strong evidence surfaces, but each one has a different job:

logs explain what one rule said while it ran
benchmarks explain what one rule cost
summaries explain workflow state across outputs
provenance explains who and what produced the run

If you keep those jobs separate, incidents become easier to explain.

The main evidence surfaces¶

Surface	Best question it answers	Weak question for this surface
dry-run with `-n -p`	what Snakemake plans to do and why	how long the tools will take
per-rule `log:` files	what happened inside a specific job	whether the whole workflow was efficient
per-rule `benchmark:` files	how expensive one job or rule family was	why the DAG became larger
`snakemake --summary`	which outputs exist, are pending, or were rebuilt	why a tool failed internally
`snakemake --list-changes input code params`	which tracked change class caused reruns	whether the rerun was expensive
`publish/v1/provenance.json`	which configuration and runtime identity produced the publish bundle	the exact text of a failing tool command

You do not need every surface for every review. You need the smallest honest one.

A clean way to think about them¶

flowchart TD
  question["review question"] --> choose["choose the narrowest evidence surface"]
  choose --> dryrun["dry-run and summaries"]
  choose --> logs["rule logs"]
  choose --> bench["benchmarks"]
  choose --> prov["provenance"]
  dryrun --> decision["decide whether more evidence is needed"]
  logs --> decision
  bench --> decision
  prov --> decision

This prevents two common mistakes:

opening huge logs before confirming the workflow even planned the expected jobs
quoting benchmark numbers before checking whether the rerun was legitimate

Logs: tell the local execution story¶

Good rule logs answer questions such as:

which sample or target failed
which command or script branch ran
whether the tool waited on data, retried internally, or exited cleanly

Good logs are specific. They name the rule-local situation.

Weak logs only print generic progress lines or dump unrelated environment detail that no reviewer asked for.

Benchmarks: measure the rule, not the mood¶

Use benchmark: when you want a durable answer to:

which rule family is expensive
whether a rule changed its runtime profile after an edit
whether tiny jobs are being overwhelmed by launch overhead

Benchmarks become especially useful when a team keeps arguing from memory:

"I think trimming got slower."

That claim should move quickly toward benchmark evidence, not toward a thread-count edit.

Summaries: keep workflow state visible¶

snakemake --summary is a workflow-state surface, not a debugging diary.

Use it to answer:

which outputs already exist
which ones are planned for rebuild
whether the workflow state matches what the incident report claims

This is often the fastest way to discover that a "performance problem" is actually a surprise rerun caused by changed inputs or changed code.

Change reports: explain why Snakemake wants to rerun¶

snakemake --list-changes input code params helps when reviewers need to know which change class triggered work.

That matters because the next review question changes with the answer:

input change points toward upstream data movement or discovery
code change points toward scripts, wrappers, or rule logic
parameter change points toward config or policy review

Without that distinction, teams often talk about churn without naming its source.

Provenance: tie the run to its identity¶

Workflow evidence is incomplete if you cannot answer:

which config values were material
which profile or operating context shaped the run
which environment or tool identity produced the published result

That is the job of provenance.

Provenance is not a replacement for logs or benchmarks. It is the identity surface that lets you compare runs honestly when the repository alone does not explain the difference.

A small example¶

Suppose a reviewer says:

The report looks different and the workflow took longer.

A good evidence route might be:

snakemake --summary to confirm what rebuilt
snakemake --list-changes input code params to classify the rerun cause
one matching benchmark file for the slowest rule family
one matching rule log for the suspicious target
publish/v1/provenance.json if the change still cannot be explained

That route is short, and every step has a reason.

What good observability looks like¶

Good observability is:

rule-local when debugging one failure
workflow-level when explaining changed state
stable enough that teams can compare runs over time
cheap enough that nobody removes it as soon as pressure rises

Bad observability usually looks like one of these:

giant shared logs with no rule ownership
benchmark files nobody reads or reviews
summaries collected but never compared to a claim
provenance missing when context differences are the real suspect

Keep this standard¶

Before adding a new evidence surface, finish this sentence:

We need this artifact because it answers this review question better than the artifacts we already have.

If you cannot finish that sentence, the workflow probably needs a clearer reading route more than it needs another file.