Performance Tuning Without Semantic Drift¶

Page Maps¶

graph LR
  family["Reproducible Research"]
  program["Deep Dive Snakemake"]
  section["Performance Observability Incident Response"]
  page["Performance Tuning Without Semantic Drift"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

The most dangerous performance improvement is the one that makes the workflow faster by teaching it to hide its own meaning.

That happens more often than teams admit.

The rule for honest tuning¶

A tuning change is honest only if reviewers can still answer the same semantic questions about the workflow after the change:

which files are declared inputs and outputs
why reruns happen
what counts as trusted published state
which operating context differences are policy rather than workflow meaning

If the speedup depends on making one of those harder to see, the workflow did not become better. It became easier to misread.

Honest tuning moves¶

These can be valid when the evidence supports them:

grouping genuinely tiny jobs whose outputs still preserve the same file contract
reducing redundant helper work that does not change declared inputs or outputs
adjusting threads, resources, or executor settings when the workflow meaning stays fixed
moving heavy temporary traffic to a better storage surface when promotion boundaries stay clear
simplifying an expensive summary or report step while keeping the public contract intact

The important phrase is "when the evidence supports them."

Dishonest tuning moves¶

These should trigger strong review resistance:

removing or weakening a dependency so Snakemake reruns less often
using hidden side effects or undeclared caches to avoid declared work
deleting logs, benchmarks, or provenance because they are inconvenient under pressure
changing publish paths or sample-selection meaning and calling it optimization
increasing retries or latency waits to hide a deterministic failure

Fast wrong workflows are still wrong workflows.

Four questions before you approve a tuning change¶

Which cost class are we actually addressing?
Which evidence says this change targets that cost class?
Which workflow truth stays exactly the same after the edit?
Which artifact will prove that the change worked honestly?

If a proposal cannot answer all four, it is not ready.

Example: grouping tiny jobs¶

Suppose a workflow launches one short QC job per sample, and executor overhead dominates.

Grouping work can be honest if:

the grouped job still produces the same declared sample outputs
the logs still let reviewers inspect failures with usable locality
benchmarks still show the cost where it now lives
rerun logic still follows the same inputs and outputs

Grouping becomes dishonest when it collapses output identity, hides which sample failed, or turns one rule into a private wrapper around undeclared work.

Example: tuning storage¶

Moving temporary files or scratch usage can be valuable.

It stays honest when:

declared outputs remain on the trusted contract surface
reviewers do not have to inspect scratch to know whether the run succeeded
promotion into publish/ or other stable outputs stays explicit

It becomes dishonest when temporary execution state quietly becomes the de facto interface.

Example: simplifying evidence¶

Sometimes logs or reports really are too noisy.

Cleaning them up can be an improvement, but only if the revised evidence still answers the review question it used to answer. "Shorter" is not automatically "clearer."

Ask:

which question did the old evidence answer poorly?
which question will the new evidence answer better?
what important diagnostic detail might be lost?

A useful review frame¶

Use this table in review when an optimization looks attractive:

Review question	Strong answer
what got faster	names the cost class and the affected rule family or boundary
why it got faster	points to a concrete structural or operational reason
what stayed the same	names the unchanged workflow contract
how we know it stayed the same	points to a dry-run, summary, benchmark, verification, or publish artifact

This slows the conversation down in a good way.

A small before-and-after note¶

Weak note:

Reduced runtime by changing rule behavior and simplifying execution.

Strong note:

Reduced scheduler overhead by grouping 400 sub-second cleanup jobs into one reviewed aggregation rule. The declared published outputs are unchanged, the per-sample result files remain separate, and the new benchmark surface moved from the leaf jobs to the aggregator rule. Dry-run and publish verification still match the previous contract.

One of these is reviewable. The other is not.

When tuning should stop and become design review¶

Escalate from tuning to design review when the proposal changes:

target selection
file contracts
publish paths
sample discovery rules
the boundary between workflow meaning and profile policy

Those are not merely performance details. They are workflow meaning.

Keep this standard¶

Do not merge a performance change until the review records both sides:

the speed claim
the semantic non-change claim

If only the first claim is written down, the second claim will be assumed, and that is how drift gets normalized.