Helpers, Scripts, Packages, and Coupling Control¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive Snakemake"]
section["Workflow Architecture File Apis"]
page["Helpers, Scripts, Packages, and Coupling Control"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
Architecture gets weaker when implementation code becomes easier to find than the workflow itself.
That usually happens slowly:
- helper modules absorb more logic
- scripts start reading undeclared files
- reusable code reaches into workflow state casually
At first, the repository looks more “organized.” Later, nobody can tell where workflow meaning ends and implementation begins.
This page is about stopping that drift.
Different code homes should imply different ownership¶
A healthy repository uses different homes for different reasons:
workflow/scripts/for workflow-adjacent implementationsrc/capstone/for reusable package codeworkflow/rules/for file contracts and orchestration
These are not status markers. They are ownership markers.
workflow/scripts/ stays close to the workflow on purpose¶
Code in workflow/scripts/ is a good fit when:
- one workflow step owns the behavior
- the code is tightly coupled to rule-local inputs and outputs
- the script is still easier to review when kept near the orchestration layer
The capstone's workflow/scripts/provenance.py is a useful example of code that belongs
close to one workflow step.
src/ is for reusable implementation, not hidden workflow logic¶
Code in src/capstone/ is a better fit when:
- logic is reused across several steps
- the code deserves direct imports and tests
- the module is part of the software layer rather than just workflow glue
That becomes dangerous only when package code starts mutating workflow meaning through undeclared assumptions.
Coupling becomes architectural when it is hidden¶
Hidden coupling often looks like this:
- helper code reads files the rule never declared
- shared code assumes config keys that are not validated clearly
- import-time side effects alter behavior in ways the rule layer does not reveal
These are not only code smells. They are architecture smells because they weaken the visible repository boundaries.
One useful contrast¶
flowchart LR
rules["workflow/rules/"] --> scripts["workflow/scripts/"]
rules --> package["src/capstone/"]
scripts --> step["step-local implementation"]
package --> reuse["reusable software"]
hidden["hidden file or config dependency"] --> drift["architecture drift"]
The point is not to maximize the number of layers. The point is to keep the visible rule graph more informative than the hidden helper internals.
A weak helper posture¶
Weak shape:
- generic
helpersmodules grow without a domain boundary - reusable code silently reaches into workflow state
- readers must inspect imports to understand workflow meaning
This is how a repository turns into a private framework.
A stronger helper posture¶
Stronger shape:
- the rule layer still owns file contracts and orchestration
- step-local code stays near the step
- reusable code stays reusable by accepting explicit inputs and parameters
- config and path assumptions remain declared in visible repository surfaces
Now code reuse supports the workflow story instead of replacing it.
A practical test¶
Ask these questions:
- Could I explain this workflow change without opening five helper modules first?
- Does this helper require undeclared files or config to function?
- Is this code reusable because it has a clean interface, or only because it is hard to find?
If the first answer is no, the repository may already be over-coupled.
Common failure modes¶
| Failure mode | What it causes | Better repair |
|---|---|---|
| giant generic helper modules | ownership disappears | split helpers by domain or by step-local ownership |
| package code reads workflow state implicitly | rule contracts stop telling the full story | pass inputs and parameters explicitly from the rule layer |
| step-local code promoted too early | reusable layer becomes ceremonial and noisy | keep code close to the step until reuse is real |
| helper imports become the only way to understand behavior | workflow review starts below the rule layer | keep orchestration and declared inputs visible in rule files |
| scripts and package code blur together | readers cannot tell what is local versus reusable | distinguish step-local implementation from reusable software clearly |
The explanation a reviewer trusts¶
Strong explanation:
the rule still owns the file contract, step-local logic stays in
workflow/scripts/, reusable logic stays insrc/, and helper code only accepts declared inputs and params instead of reaching into hidden workflow state.
Weak explanation:
we moved the complex code into helpers so the workflow looks cleaner.
The strong explanation protects boundaries. The weak explanation only relocates complexity.
End-of-page checkpoint¶
Before leaving this page, you should be able to:
- explain the architectural difference between
workflow/scripts/andsrc/ - describe why hidden helper dependencies are architecture problems
- explain why the rule layer should remain easier to inspect than helper internals
- name one sign that a repository is becoming a private framework