Module 06: Publishing and Downstream Contracts¶

Module Position¶

flowchart TD
  family["Reproducible Research"] --> program["Deep Dive Snakemake"]
  program --> module["Module 06: Publishing and Downstream Contracts"]
  module --> lessons["Lesson pages and worked examples"]
  module --> checkpoints["Exercises and closing criteria"]
  module --> capstone["Related capstone evidence"]

flowchart TD
  purpose["Start with the module purpose and main questions"] --> lesson_map["Use the lesson map to choose reading order"]
  lesson_map --> study["Read the lessons and examples with one review question in mind"]
  study --> proof["Test the idea with exercises and capstone checkpoints"]
  proof --> close["Move on only when the closing criteria feel concrete"]

Read the first diagram as a placement map: this page sits between the course promise, the lesson pages listed below, and the capstone surfaces that pressure-test the module. Read the second diagram as the study route for this page, so the diagrams point you toward the Lesson map, Exercises, and Closing criteria instead of acting like decoration.

A workflow run is not automatically a trustworthy result. The internal execution state that helps the workflow operate is often not the same thing as the published surface that another human, notebook, or downstream pipeline is allowed to depend on.

This module is about designing that boundary on purpose: what remains internal, what is published, how published outputs are versioned, and how manifests, reports, and checksums support trust without turning every run into a pile of unreviewable noise.

Capstone exists here as corroboration. The local publishing exercises should already make the internal-versus-public split clear before you inspect publish/v1/ and the file API in the reference workflow.

Before You Begin¶

This module works best after Modules 01-05, especially the parts on file contracts, provenance, and safe rule boundaries.

Use this module if you need to learn how to:

separate internal workflow state from stable published deliverables
version a publish boundary so downstream users know what they are trusting
decide which reports and manifests are proof artifacts and which are part of the public contract

Proof loop for this module:

snakemake -n
snakemake --summary
snakemake publish/v1/manifest.json

Capstone corroboration:

inspect capstone/publish/v1/
inspect Publish Review Guide
inspect capstone/workflow/rules/publish.smk
inspect Capstone Walkthrough

At a Glance¶

Focus	Learner question	Capstone timing
publish boundaries	"Which files are internal workflow state, and which files are safe for others to trust?"	inspect `publish/v1/` only after the internal-versus-public split is explicit
versioned contracts	"How do downstream users know what changed and what remained stable?"	compare publish structure and the file API together
proof artifacts	"Which manifests and reports defend the published bundle without becoming the contract themselves?"	use the capstone when you are ready to review evidence surfaces intentionally

1) Table of Contents¶

Table of Contents
Learning Outcomes
How to Use This Module
Core 1 — Internal Results Versus Published Results
Core 2 — Versioned Publish Boundaries
Core 3 — Manifests, Checksums, and File APIs
Core 4 — Reports and Human-Readable Proof
Core 5 — Reviewing a Published Contract for Drift
Capstone Sidebar
Exercises
Closing Criteria

2) Learning Outcomes¶

By the end of this module, you can:

distinguish internal workflow outputs from stable downstream deliverables
define a versioned publish boundary that survives repository growth
use manifests and checksums to make a published tree auditable
decide when a report is part of the contract versus supporting evidence
review a workflow’s publish surface for accidental coupling or drift

Back to top

3) How to Use This Module¶

Build a local lab with two output layers:

lab/
  workflow/
    Snakefile
  results/
  publish/
    v1/
  docs/

Require your lab to produce:

internal per-sample results under results/
a stable published bundle under publish/v1/
a manifest that inventories the published files
one human-readable report that explains the run

The teaching goal is to make the learner able to answer, file by file, which outputs are safe for downstream consumers to trust.

Back to top

4) Core 1 — Internal Results Versus Published Results¶

Not every output belongs in the public contract.

Internal state often includes:

intermediates needed for later jobs
per-rule logs and benchmarks
scratch outputs that support aggregation
diagnostic summaries used for review

Published state should include only what downstream readers are allowed to rely on:

stable paths
stable formats
clear versioning
enough supporting evidence to validate what was published

If a downstream notebook depends on results/ because the publish boundary is vague, the workflow is teaching the wrong contract.

Back to top

5) Core 2 — Versioned Publish Boundaries¶

Versioned publication is not ceremony. It is how you tell a consumer whether a file path or format is still meant to be stable.

Good publish-boundary questions:

what are the canonical outputs for this workflow version?
which paths may change without breaking consumers?
what should a reviewer inspect before approving a new publish version?

Common discipline:

keep the stable surface in a path such as publish/v1/
document it in a file API or contract note
change the version only when the downstream contract really changes

Bad discipline:

publishing directly from results/
letting report names drift every run
changing manifest structure without treating it as a contract change

Back to top

6) Core 3 — Manifests, Checksums, and File APIs¶

A publish directory is much more trustworthy when it can explain itself.

Three complementary surfaces help:

Surface	Purpose
manifest	lists what was published
checksums	prove file identity and detect tampering or accidental drift
file API document	tells a human what each artifact means and how stable it is

You do not need to dump every internal detail into the manifest. You do need enough information to answer:

what files belong in the bundle
how a consumer can validate them
which artifact is authoritative for each question

Back to top

7) Core 4 — Reports and Human-Readable Proof¶

Published workflows often need two kinds of artifacts:

machine-consumable contract artifacts
human-readable proof artifacts

Examples:

summary.tsv or summary.json for downstream programmatic use
report/index.html for human interpretation
provenance or manifest JSON for validation and review

The mistake is not having both. The mistake is failing to say which role each one plays.

A report should help answer “what happened?” without being the only authoritative source of machine-readable truth.

Back to top

8) Core 5 — Reviewing a Published Contract for Drift¶

Review the publish surface the same way you review code:

did the set of published files change?
did any stable path move or disappear?
did the file API still describe reality?
are reports and manifests still aligned?
did new internal artifacts leak into the public contract accidentally?

Drift is often introduced by convenience:

“just publish this extra JSON too”
“let’s read from results/ for now”
“the report has the information, so we do not need to update the manifest”

Those shortcuts turn future reviews into archaeology.

Back to top

Use the capstone to inspect:

publish/v1/ as the stable public surface
FILE_API.md as the human contract for those files
workflow/rules/publish.smk as the logic that draws the boundary
TOUR.md and the workflow tour bundle as human-readable proof surfaces

Back to top

10) Exercises¶

Split one workflow’s outputs into internal results and a clean published bundle.
Add a manifest and explain which published artifacts are authoritative for downstream code.
Create one human-readable report and one machine-readable summary without making them interchangeable.
Simulate a publish-boundary change, then decide whether it requires a new publish version.

Back to top

11) Closing Criteria¶

You pass this module only if you can demonstrate:

a clear distinction between internal outputs and published outputs
a versioned publish boundary with stable paths
a manifest or checksum surface that validates the published tree
a documented contract that a downstream consumer could follow without guessing

Back to top

Directory glossary¶

Use Glossary when you want the recurring language in this module kept stable while you move between lessons, exercises, and capstone checkpoints.

Module 06: Publishing and Downstream Contracts¶

Module Position¶

Before You Begin¶

At a Glance¶

1) Table of Contents¶

2) Learning Outcomes¶

3) How to Use This Module¶

4) Core 1 — Internal Results Versus Published Results¶

5) Core 2 — Versioned Publish Boundaries¶

6) Core 3 — Manifests, Checksums, and File APIs¶

7) Core 4 — Reports and Human-Readable Proof¶

8) Core 5 — Reviewing a Published Contract for Drift¶

9) Capstone Sidebar¶

10) Exercises¶

11) Closing Criteria¶

Directory glossary¶