Module 03: Production Operations and Policy Boundaries¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive Snakemake"]
section["Production Operations Policy Boundaries"]
page["Module 03: Production Operations and Policy Boundaries"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
Modules 01 and 02 teach workflow truth and disciplined dynamic behavior. Module 03 asks the next practical question:
how do you run that workflow under real pressure without letting operational convenience change its meaning?
This module is about making production operation explicit. Profiles, retries, staging, logs, and confirmation routes are useful only when they stay on the policy side of the line instead of leaking into workflow semantics.
What this module is for¶
By the end of Module 03, you should be able to explain five things in plain language:
- what belongs in a profile and what belongs in workflow meaning
- which failures may be retried and which should stop the run immediately
- how incomplete outputs, logs, and failure evidence keep recovery honest
- how staging and shared-filesystem assumptions affect operation without rewriting semantics
- what proof route shows that a production workflow still deserves trust
Study route¶
flowchart TD
start["Overview"] --> core1["Profiles, Defaults, and Workflow Meaning"]
core1 --> core2["Failure Policy, Retries, and Incomplete Outputs"]
core2 --> core3["Staging, Shared Filesystems, and Data Locality"]
core3 --> core4["Proof Routes, Selftests, and Clean-Room Confirmation"]
core4 --> core5["Operational Governance and Policy Review"]
core5 --> example["Worked Example: Hardening a Workflow for Production Use"]
example --> practice["Exercises"]
practice --> answers["Exercise Answers"]
answers --> glossary["Glossary"]
Read the module in that order the first time. When you return later, jump straight to the page that matches the operational pressure in front of you.
The ten files in this module¶
- Overview (
index.md) - Profiles, Defaults, and Workflow Meaning
- Failure Policy, Retries, and Incomplete Outputs
- Staging, Shared Filesystems, and Data Locality
- Proof Routes, Selftests, and Clean-Room Confirmation
- Operational Governance and Policy Review
- Worked Example: Hardening a Workflow for Production Use
- Exercises
- Exercise Answers
- Glossary
How to use the file set¶
| If you need to... | Start here |
|---|---|
| separate profile settings from workflow meaning | Profiles, Defaults, and Workflow Meaning |
| decide when to retry, rerun, or fail fast | Failure Policy, Retries, and Incomplete Outputs |
| reason about latency, scratch space, and shared-storage assumptions | Staging, Shared Filesystems, and Data Locality |
| choose the smallest honest production proof route | Proof Routes, Selftests, and Clean-Room Confirmation |
| review policy drift and operational ownership over time | Operational Governance and Policy Review |
| see the whole module as one repaired production workflow | Worked Example: Hardening a Workflow for Production Use |
| test your own understanding | Exercises |
| compare your reasoning against a reference answer | Exercise Answers |
| stabilize the module vocabulary | Glossary |
The running question¶
Carry this question through every page:
if the execution context changes tomorrow, what exact boundary proves the workflow meaning did not?
Good Module 03 answers usually mention one or more of these:
- a profile setting that is clearly operational
- a failure policy that keeps poison outputs from being trusted
- a staging or latency assumption that is declared instead of implied
- a selftest or confirmation route that compares runs honestly
- a governance rule that keeps policy drift reviewable
The running example¶
This module keeps returning to one practical workflow shape:
- the workflow can run locally or in CI through different profiles
- partial failures leave logs and rerunnable evidence instead of ambiguous state
- a clean-room route proves the repository from the outside
- a policy review can explain which settings changed execution context and which would have changed workflow meaning
That is the smallest production story worth teaching.
Commands to keep close¶
These commands form the evidence loop for Module 03:
snakemake --profile profiles/local -n
snakemake --profile profiles/ci -n
snakemake --lint
make profile-audit
make confirm
They answer different questions:
- what the workflow plans under one local policy surface
- what changes under a second policy surface
- whether the workflow already shows contract problems
- how the repository packages profile differences for review
- whether the strongest built-in confirmation path still passes
Learning outcomes¶
By the end of this module, you should be able to:
- keep profile policy separate from workflow semantics
- design retry and incomplete-output handling without hiding real failures
- explain how staging and filesystem latency affect operational trust
- choose a proportionate proof route for workflow operation
- review policy changes as durable repository decisions instead of one-off shell habits
Exit standard¶
Do not move on until all of these are true:
- you can explain one profile change that is safe and one that would be semantic drift
- you can describe one failure that should be retried and one that should fail fast
- you can say where a partial output becomes safe to trust or must be rerun
- you can name one proof route stronger than dry-run and one reason it matters
- you can explain how another maintainer should review a policy change later
When those become ordinary, Module 03 has done its job.