Modules, Reuse, and Explicit Interfaces¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive Snakemake"]
section["Scaling Workflows Interface Boundaries"]
page["Modules, Reuse, and Explicit Interfaces"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
Once rule-family splits are working, the next scaling question appears:
is this still one workflow graph with clearer organization, or is one part becoming a reusable workflow boundary of its own?
That is where modules enter.
A module is not just a smaller file¶
A workflow module should exist for a stronger reason than convenience.
It should answer a real boundary question:
- can this workflow bundle be named as one thing
- does it have explicit inputs and outputs
- can another reader understand the main graph without reading every implementation detail first
If those answers are weak, the repository probably needs clearer includes, not a module.
What a healthy module buys you¶
A healthy module can help when:
- a sub-workflow has a stable contract
- the same bundle is reused in more than one context
- the top-level workflow stays easier to read after the boundary is introduced
This is why the capstone keeps reusable bundles under workflow/modules/ for rule groups
such as QC or screen logic.
The module is useful because it has a named interface, not because the word sounds architectural.
What a module must make explicit¶
When a boundary becomes a real module, the reader should be able to answer:
- what does this module consume
- what does it produce
- which paths or parameters belong to its interface
- what remains private implementation detail
If the answers live only in the author's head, the module boundary is too weak.
One healthy mental model¶
flowchart LR
main["top-level workflow"] --> iface["named module interface"]
iface --> module["workflow/modules/..."]
module --> outputs["declared outputs"]
This matters because the main graph should still be explainable:
- the top-level workflow names the boundary
- the module owns its internal implementation
- the interface is what another rule family is allowed to depend on
That is a stronger contract than "the code happens to live elsewhere."
A weak module smell¶
Weak module shape:
- the top-level workflow imports a module
- the module reads hidden globals or path conventions from many places
- the reader still needs to inspect private files before understanding the main graph
This does not create a reusable boundary. It only moves confusion.
A stronger module shape¶
Stronger module shape:
- the top-level workflow names what the module is for
- the interface paths or parameters are visible at the call site
- the module produces a stable artifact family that other code can reason about
That is the kind of boundary worth introducing.
Modules and file APIs belong together¶
Once you have a true module boundary, file-interface thinking becomes important fast.
The reader should know:
- which module outputs are safe for the rest of the workflow to depend on
- which outputs are internal staging or helper state
- whether a change to those outputs is a local refactor or an interface break
That is why this page leads directly into file APIs and schema validation.
Common failure modes¶
| Failure mode | What it looks like | Better repair |
|---|---|---|
| a module exists only because a file was long | the interface is vague and reuse is imaginary | keep the split at rule-family level instead |
| hidden globals leak into module behavior | the call site does not explain the contract | make inputs and outputs explicit at the boundary |
| the main graph becomes harder to read after modularization | readers open implementation files too early | simplify the top-level orchestration surface |
| reusable outputs are undocumented | interface drift is hard to detect | pair modules with file-API thinking and validation |
| modules become junk drawers | the boundary exists, but no one can name its job | narrow the module to one stable concern |
The explanation a reviewer trusts¶
Strong explanation:
this boundary became a real module because the QC bundle has a stable interface, the main workflow can name its inputs and outputs clearly, and the top-level graph is easier to explain after the module is introduced.
Weak explanation:
we moved it into
workflow/modules/because it looked more reusable there.
The first explanation gives an interface reason. The second gives a folder preference.
End-of-page checkpoint¶
Before leaving this page, you should be able to:
- explain one case where a rule family should remain an include rather than a module
- name the minimum interface questions a module must answer
- describe one sign that a module boundary is too vague to trust
- explain why modules and file APIs naturally belong in the same discussion