Profiles, Defaults, and Workflow Meaning¶

Page Maps¶

graph LR
  family["Reproducible Research"]
  program["Deep Dive Snakemake"]
  section["Production Operations Policy Boundaries"]
  page["Profiles, Defaults, and Workflow Meaning"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Module 03 begins with the boundary that most production confusion grows out of:

a profile may change how the workflow runs, but it must not change what the workflow means.

If that sentence is fuzzy, every later operational decision becomes harder to review.

Profiles are for operating context¶

Profiles are where a repository records run policy such as:

whether incomplete work should rerun
whether shell commands should be printed
how long to wait for filesystem latency
which executor or scheduler-facing defaults apply

Those settings matter. They still do not define the analytical meaning of the workflow.

The workflow meaning should still come from:

declared inputs
config that changes the intended outputs
rule code and helper code
published contract surfaces

That is the boundary this page is defending.

Why this split matters in practice¶

When profiles and workflow meaning blur together, several bad things happen at once:

a local convenience file quietly changes what gets built
CI and local runs produce different intended results without anyone naming the reason
a profile review sounds operational even though it is hiding a semantic change
later maintainers cannot tell whether a diff needs scientific review or only runtime review

Strong production workflows make that distinction reviewable before the incident happens.

A healthy profile example¶

From the capstone, a local profile contains settings like:

rerun-incomplete: true
printshellcmds: true
show-failed-logs: true
latency-wait: 30

Those are strong policy examples because they answer operational questions:

how much run detail should be shown
how patient should the run be with filesystem lag
what should happen after an incomplete output is detected

None of those settings says which samples exist or what the publish bundle is supposed to mean.

A weak profile example¶

This would be a bad profile boundary:

samples:
  - sampleA
  - sampleB
publish_version: v2

Why it is bad:

samples changes the target set
publish_version changes a public contract surface
both settings alter workflow meaning, not just runtime context

Those values belong in workflow config or in explicit repository code, not in a machine-facing profile.

Defaults are not a loophole¶

Learners sometimes understand the profile rule and then break it through defaults inside the workflow:

hidden environment-variable fallbacks
local-path shortcuts that only one maintainer knows
runtime branches based on hostnames or current shells

Those are not cleaner than bad profiles. They are just harder to review.

Healthy defaults should be:

visible in Snakefile or config
stable across contexts unless deliberately overridden
harmless when the operating profile changes

One useful review table¶

Question	If yes, it belongs closer to...
does this change which outputs the workflow is supposed to produce	workflow config or code
does this change scheduling, visibility, or failure handling only	profile policy
would a downstream contract need to change if this changed	workflow meaning
could local, CI, and SLURM all vary this safely	profile policy

This table is not perfect. It is still a strong habit for human review.

The capstone pattern worth copying¶

The capstone keeps three profile directories:

profiles/local/
profiles/ci/
profiles/slurm/

That design teaches an important lesson:

the workflow meaning is one thing
the operating contexts are several
the repository is allowed to represent that difference openly

Profiles are not a hack here. They are the formal place where context variation lives.

A good audit question¶

When a profile changes, do not ask only:

does the workflow still run?

Ask:

would this diff require semantic review if it were moved into config or rule code?

If the answer is yes, the boundary is probably wrong already.

Common failure modes¶

Failure mode	What it looks like	Better repair
sample or reference data appears in a profile	local and CI differ in target meaning	move semantic inputs into config
hidden shell defaults choose a workflow branch	the run depends on who launched it	promote the choice into explicit config or code
profile names are vague	nobody can explain why two profiles exist	name them by operating context, such as `local`, `ci`, or `slurm`
profile diffs are reviewed casually	semantic drift sneaks in under operational language	review profile changes against the policy-versus-meaning boundary explicitly
command-line flags are the only real policy record	the repository cannot explain how it was normally run	version the stable flags in profiles

The explanation a reviewer trusts¶

Strong explanation:

profiles/local and profiles/ci differ in operational settings like log visibility and latency handling, but the sample set, publish version, and rule logic stay in workflow config and code, so changing profiles alters context without altering meaning.

Weak explanation:

profiles are where we put the stuff that is easier not to hardcode.

The first explanation gives a boundary. The second gives a convenience excuse.

End-of-page checkpoint¶

Before leaving this page, you should be able to:

name three settings that belong in a profile
name two settings that do not belong in a profile
explain why local, CI, and SLURM profiles can differ safely
describe one review question that catches policy leaking into workflow meaning