Module 06: Publishing and Downstream Contracts¶
Module Position¶
flowchart TD
family["Reproducible Research"] --> program["Deep Dive Snakemake"]
program --> module["Module 06: Publishing and Downstream Contracts"]
module --> lessons["Lesson pages and worked examples"]
module --> checkpoints["Exercises and closing criteria"]
module --> capstone["Related capstone evidence"]
flowchart TD
purpose["Start with the module purpose and main questions"] --> lesson_map["Use the lesson map to choose reading order"]
lesson_map --> study["Read the lessons and examples with one review question in mind"]
study --> proof["Test the idea with exercises and capstone checkpoints"]
proof --> close["Move on only when the closing criteria feel concrete"]
Read the first diagram as a placement map: this page sits between the course promise, the lesson pages listed below, and the capstone surfaces that pressure-test the module. Read the second diagram as the study route for this page, so the diagrams point you toward the Lesson map, Exercises, and Closing criteria instead of acting like decoration.
A workflow run is not automatically a trustworthy result. The internal execution state that helps the workflow operate is often not the same thing as the published surface that another human, notebook, or downstream pipeline is allowed to depend on.
This module is about designing that boundary on purpose: what remains internal, what is published, how published outputs are versioned, and how manifests, reports, and checksums support trust without turning every run into a pile of unreviewable noise.
Capstone exists here as corroboration. The local publishing exercises should already make
the internal-versus-public split clear before you inspect publish/v1/ and the file API
in the reference workflow.
Before You Begin¶
This module works best after Modules 01-05, especially the parts on file contracts, provenance, and safe rule boundaries.
Use this module if you need to learn how to:
- separate internal workflow state from stable published deliverables
- version a publish boundary so downstream users know what they are trusting
- decide which reports and manifests are proof artifacts and which are part of the public contract
Proof loop for this module:
Capstone corroboration:
- inspect
capstone/publish/v1/ - inspect Publish Review Guide
- inspect
capstone/workflow/rules/publish.smk - inspect Capstone Walkthrough
At a Glance¶
| Focus | Learner question | Capstone timing |
|---|---|---|
| publish boundaries | "Which files are internal workflow state, and which files are safe for others to trust?" | inspect publish/v1/ only after the internal-versus-public split is explicit |
| versioned contracts | "How do downstream users know what changed and what remained stable?" | compare publish structure and the file API together |
| proof artifacts | "Which manifests and reports defend the published bundle without becoming the contract themselves?" | use the capstone when you are ready to review evidence surfaces intentionally |
1) Table of Contents¶
- Table of Contents
- Learning Outcomes
- How to Use This Module
- Core 1 — Internal Results Versus Published Results
- Core 2 — Versioned Publish Boundaries
- Core 3 — Manifests, Checksums, and File APIs
- Core 4 — Reports and Human-Readable Proof
- Core 5 — Reviewing a Published Contract for Drift
- Capstone Sidebar
- Exercises
- Closing Criteria
2) Learning Outcomes¶
By the end of this module, you can:
- distinguish internal workflow outputs from stable downstream deliverables
- define a versioned publish boundary that survives repository growth
- use manifests and checksums to make a published tree auditable
- decide when a report is part of the contract versus supporting evidence
- review a workflow’s publish surface for accidental coupling or drift
3) How to Use This Module¶
Build a local lab with two output layers:
Require your lab to produce:
- internal per-sample results under
results/ - a stable published bundle under
publish/v1/ - a manifest that inventories the published files
- one human-readable report that explains the run
The teaching goal is to make the learner able to answer, file by file, which outputs are safe for downstream consumers to trust.
4) Core 1 — Internal Results Versus Published Results¶
Not every output belongs in the public contract.
Internal state often includes:
- intermediates needed for later jobs
- per-rule logs and benchmarks
- scratch outputs that support aggregation
- diagnostic summaries used for review
Published state should include only what downstream readers are allowed to rely on:
- stable paths
- stable formats
- clear versioning
- enough supporting evidence to validate what was published
If a downstream notebook depends on results/ because the publish boundary is vague, the
workflow is teaching the wrong contract.
5) Core 2 — Versioned Publish Boundaries¶
Versioned publication is not ceremony. It is how you tell a consumer whether a file path or format is still meant to be stable.
Good publish-boundary questions:
- what are the canonical outputs for this workflow version?
- which paths may change without breaking consumers?
- what should a reviewer inspect before approving a new publish version?
Common discipline:
- keep the stable surface in a path such as
publish/v1/ - document it in a file API or contract note
- change the version only when the downstream contract really changes
Bad discipline:
- publishing directly from
results/ - letting report names drift every run
- changing manifest structure without treating it as a contract change
6) Core 3 — Manifests, Checksums, and File APIs¶
A publish directory is much more trustworthy when it can explain itself.
Three complementary surfaces help:
| Surface | Purpose |
|---|---|
| manifest | lists what was published |
| checksums | prove file identity and detect tampering or accidental drift |
| file API document | tells a human what each artifact means and how stable it is |
You do not need to dump every internal detail into the manifest. You do need enough information to answer:
- what files belong in the bundle
- how a consumer can validate them
- which artifact is authoritative for each question
7) Core 4 — Reports and Human-Readable Proof¶
Published workflows often need two kinds of artifacts:
- machine-consumable contract artifacts
- human-readable proof artifacts
Examples:
summary.tsvorsummary.jsonfor downstream programmatic usereport/index.htmlfor human interpretation- provenance or manifest JSON for validation and review
The mistake is not having both. The mistake is failing to say which role each one plays.
A report should help answer “what happened?” without being the only authoritative source of machine-readable truth.
8) Core 5 — Reviewing a Published Contract for Drift¶
Review the publish surface the same way you review code:
- did the set of published files change?
- did any stable path move or disappear?
- did the file API still describe reality?
- are reports and manifests still aligned?
- did new internal artifacts leak into the public contract accidentally?
Drift is often introduced by convenience:
- “just publish this extra JSON too”
- “let’s read from
results/for now” - “the report has the information, so we do not need to update the manifest”
Those shortcuts turn future reviews into archaeology.
9) Capstone Sidebar¶
Use the capstone to inspect:
publish/v1/as the stable public surfaceFILE_API.mdas the human contract for those filesworkflow/rules/publish.smkas the logic that draws the boundaryTOUR.mdand the workflow tour bundle as human-readable proof surfaces
10) Exercises¶
- Split one workflow’s outputs into internal results and a clean published bundle.
- Add a manifest and explain which published artifacts are authoritative for downstream code.
- Create one human-readable report and one machine-readable summary without making them interchangeable.
- Simulate a publish-boundary change, then decide whether it requires a new publish version.
11) Closing Criteria¶
You pass this module only if you can demonstrate:
- a clear distinction between internal outputs and published outputs
- a versioned publish boundary with stable paths
- a manifest or checksum surface that validates the published tree
- a documented contract that a downstream consumer could follow without guessing
Directory glossary¶
Use Glossary when you want the recurring language in this module kept stable while you move between lessons, exercises, and capstone checkpoints.