Modules, Reuse, and Explicit Interfaces¶

Page Maps¶

graph LR
  family["Reproducible Research"]
  program["Deep Dive Snakemake"]
  section["Scaling Workflows Interface Boundaries"]
  page["Modules, Reuse, and Explicit Interfaces"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Once rule-family splits are working, the next scaling question appears:

is this still one workflow graph with clearer organization, or is one part becoming a reusable workflow boundary of its own?

That is where modules enter.

A module is not just a smaller file¶

A workflow module should exist for a stronger reason than convenience.

It should answer a real boundary question:

can this workflow bundle be named as one thing
does it have explicit inputs and outputs
can another reader understand the main graph without reading every implementation detail first

If those answers are weak, the repository probably needs clearer includes, not a module.

What a healthy module buys you¶

A healthy module can help when:

a sub-workflow has a stable contract
the same bundle is reused in more than one context
the top-level workflow stays easier to read after the boundary is introduced

This is why the capstone keeps reusable bundles under workflow/modules/ for rule groups such as QC or screen logic.

The module is useful because it has a named interface, not because the word sounds architectural.

What a module must make explicit¶

When a boundary becomes a real module, the reader should be able to answer:

what does this module consume
what does it produce
which paths or parameters belong to its interface
what remains private implementation detail

If the answers live only in the author's head, the module boundary is too weak.

One healthy mental model¶

flowchart LR
  main["top-level workflow"] --> iface["named module interface"]
  iface --> module["workflow/modules/..."]
  module --> outputs["declared outputs"]

This matters because the main graph should still be explainable:

the top-level workflow names the boundary
the module owns its internal implementation
the interface is what another rule family is allowed to depend on

That is a stronger contract than "the code happens to live elsewhere."

A weak module smell¶

Weak module shape:

the top-level workflow imports a module
the module reads hidden globals or path conventions from many places
the reader still needs to inspect private files before understanding the main graph

This does not create a reusable boundary. It only moves confusion.

A stronger module shape¶

Stronger module shape:

the top-level workflow names what the module is for
the interface paths or parameters are visible at the call site
the module produces a stable artifact family that other code can reason about

That is the kind of boundary worth introducing.

Modules and file APIs belong together¶

Once you have a true module boundary, file-interface thinking becomes important fast.

The reader should know:

which module outputs are safe for the rest of the workflow to depend on
which outputs are internal staging or helper state
whether a change to those outputs is a local refactor or an interface break

That is why this page leads directly into file APIs and schema validation.

Common failure modes¶

Failure mode	What it looks like	Better repair
a module exists only because a file was long	the interface is vague and reuse is imaginary	keep the split at rule-family level instead
hidden globals leak into module behavior	the call site does not explain the contract	make inputs and outputs explicit at the boundary
the main graph becomes harder to read after modularization	readers open implementation files too early	simplify the top-level orchestration surface
reusable outputs are undocumented	interface drift is hard to detect	pair modules with file-API thinking and validation
modules become junk drawers	the boundary exists, but no one can name its job	narrow the module to one stable concern

The explanation a reviewer trusts¶

Strong explanation:

this boundary became a real module because the QC bundle has a stable interface, the main workflow can name its inputs and outputs clearly, and the top-level graph is easier to explain after the module is introduced.

Weak explanation:

we moved it into workflow/modules/ because it looked more reusable there.

The first explanation gives an interface reason. The second gives a folder preference.

End-of-page checkpoint¶

Before leaving this page, you should be able to:

explain one case where a rule family should remain an include rather than a module
name the minimum interface questions a module must answer
describe one sign that a module boundary is too vague to trust
explain why modules and file APIs naturally belong in the same discussion