Internal Results Versus Public Contracts¶

Page Maps¶

graph LR
  family["Reproducible Research"]
  program["Deep Dive Snakemake"]
  section["Publishing Downstream Contracts"]
  page["Internal Results Versus Public Contracts"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

The first publishing decision is the most important one:

which outputs belong to the workflow, and which outputs belong to the downstream world?

If that line is vague, the workflow becomes harder to review and downstream users are forced to guess which files are safe to trust.

This page is about making that line explicit.

A workflow usually produces more files than it should publish¶

A healthy workflow often creates many useful files:

per-step logs
benchmarks
scratch intermediates
per-sample result files
validation artifacts
final summaries and reports

Those files do not all make the same promise.

Some exist to help the workflow operate. Some exist to help maintainers review a run. Some exist so another person or tool can rely on the results later.

Publishing begins when you decide which category a file belongs to.

Internal state is still important, but it is not the public contract¶

Internal outputs are often necessary for:

chaining rules together
debugging failures
inspecting detailed sample-level behavior
supporting later aggregation

In the capstone, much of results/ serves that purpose. It contains per-sample files that help the workflow do its job, but it is not the main downstream promise.

That distinction matters because downstream consumers should not be forced to reverse engineer the workflow layout to use the published results.

Published outputs are an explicit promise¶

A published output says something stronger than "this file exists."

It says:

this path is intentionally exposed
this file meaning is stable enough for consumers to rely on
this artifact belongs to a reviewable contract

That is why the capstone promotes a smaller surface under publish/v1/.

Files such as:

summary.json
summary.tsv
report/index.html
manifest.json
provenance.json
discovered_samples.json

are not just another directory. They are the deliberate public boundary.

One simple contrast¶

flowchart LR
  results["results/"] --> internal["workflow-owned internal state"]
  publish["publish/v1/"] --> public["downstream-facing contract"]
  internal --> review["supports workflow execution and review"]
  public --> trust["supports downstream trust and reuse"]

This diagram matters because many publishing mistakes begin when both directories are treated as if they made the same promise.

They do not.

A weak boundary¶

Weak shape:

notebooks read directly from results/
published files are chosen ad hoc after the run
a path becomes “public” only because someone found it useful once

This feels flexible at first. It creates long-term fragility:

internal layout changes break downstream use
reviewers cannot tell which files are meant to stay stable
publication discipline arrives only after confusion has already started

A stronger boundary¶

Stronger shape:

keep workflow-operational state under internal directories such as results/
promote only the intended downstream artifacts into a versioned publish boundary
treat every file inside that publish boundary as something a reviewer must be able to defend

That design gives both maintainers and consumers a clear answer to the same question:

which files are safe to depend on?

A practical test¶

Ask these questions about any output:

Is this file mainly for the workflow to continue operating?
Is this file mainly for maintainers to debug or inspect a run?
Is this file intentionally exposed for downstream use?

If the answer is mostly the first or second, it probably does not belong in the publish contract.

If the answer is the third, it needs stronger naming, stability, and review.

Common failure modes¶

Failure mode	What goes wrong	Better repair
downstream notebooks read from `results/`	internal layout becomes a hidden public API	publish a smaller stable bundle and steer consumers there
publish directory copies too many internal files	contract grows without intention	publish only the outputs a consumer is meant to trust
reports are the only visible outputs	machine-readable contract becomes unclear	pair human-oriented reports with explicit machine-readable artifacts
logs or benchmarks leak into the public surface	consumers inherit maintenance noise	keep operational evidence separate from the downstream contract
no one can explain why a file is public	review becomes guesswork	require a clear downstream-use reason for promoted files

The explanation a reviewer trusts¶

Strong explanation:

these per-sample files remain under results/ because they support workflow execution and detailed review, while publish/v1/ contains the smaller bundle downstream users are allowed to trust as the stable contract.

Weak explanation:

we copied the important files into publish/ after the run finished.

The strong explanation defines ownership and intention. The weak explanation describes a folder move without a contract.

End-of-page checkpoint¶

Before leaving this page, you should be able to:

explain why results/ and publish/v1/ make different promises
give one example of an internal output that should not become public
give one example of a file that belongs in the published contract
explain why downstream consumers should not need to understand the whole workflow tree