Helper Code, Packages, and Reusable Boundaries¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive Snakemake"]
section["Software Boundaries Reproducible Rules"]
page["Helper Code, Packages, and Reusable Boundaries"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
Once you accept that some logic should move out of rules, the next question appears immediately:
where should that code live?
That question matters because not all software in a workflow repository has the same job.
Some code is close to one workflow step. Some code is reusable application logic. Some code is a third-party wrapper with its own contract.
If those categories blur together, the repository becomes harder to explain and harder to maintain.
Three different homes mean three different kinds of ownership¶
The most useful split in a Snakemake repository is often this:
workflow/scripts/for step-adjacent implementationsrc/for reusable package code- wrappers for externally maintained command surfaces
These are not style preferences. They express different ownership boundaries.
workflow/scripts/ is close to the rule on purpose¶
Code under workflow/scripts/ is a good fit when:
- one rule owns the behavior
- the code depends on the
snakemakeobject or on step-local file contracts - reuse outside that step is limited
The capstone's workflow/scripts/provenance.py is a useful example. Its job is tightly
connected to one publication-oriented step. That makes script-level ownership reasonable.
src/ is for software that deserves a normal program shape¶
Code under src/ is a better fit when:
- several rules may need the same logic
- the code deserves direct imports and tests
- the logic is part of the project's domain model, not only workflow glue
The capstone already signals this by placing reusable package code under src/capstone/.
That layout tells future readers something important:
- this code is part of the repository's software surface
- it should be importable, testable, and understandable outside a Snakemake run
Wrappers solve a different problem again¶
A wrapper is useful when:
- an external tool already has a stable invocation pattern
- repeating the same command plumbing across rules would add noise
- the wrapper's contract is understood and acceptable
But a wrapper is not a place to hide understanding.
If the team cannot explain:
- what the wrapper runs
- what files the rule truly depends on
- what software boundary the wrapper assumes
then the wrapper has reduced clarity rather than improved it.
One comparison that helps¶
| Code home | Typical owner | Best use |
|---|---|---|
workflow/scripts/ |
the workflow step | meaningful step-local implementation |
src/ package code |
the repository's software layer | reusable logic, tests, direct imports |
| wrapper | a stable external command surface | standard invocation without repeating plumbing |
This table gives you a practical way to reason about placement without memorizing rules blindly.
A concrete split¶
flowchart LR
rule["Rule"] --> script["workflow/scripts/step_logic.py"]
rule --> package["src/project_name/..."]
rule --> wrapper["wrapper or external tool"]
script --> local["step-local behavior"]
package --> reusable["reusable software"]
wrapper --> external["external command contract"]
The point is not that every rule must point to all three. The point is that each route means something different.
Weak placement¶
Weak shape:
- every non-trivial script gets dropped into
workflow/scripts/ - package-worthy code never graduates into
src/ - wrappers are adopted without reviewing the external behavior they bring in
This often feels quick in the short term. Two months later, nobody can tell which code is step-local and which code is part of the reusable software layer.
Strong placement¶
Strong shape:
- keep step-bound code close to the step
- promote real reusable logic into a package
- use wrappers when they reduce repeated command ceremony without hiding meaning
That gives the repository a readable shape.
A practical test¶
Ask these questions about a piece of code:
- Would I want to import and test this outside a Snakemake run?
- Is this code tightly coupled to one step's file contract?
- Am I relying on an external wrapper that the team can actually review?
If the answer to the first question is yes, src/ is often the right home.
If the answer to the second question is yes, workflow/scripts/ may be enough.
If the third question is no, a wrapper may be the wrong abstraction.
Common failure modes¶
| Failure mode | What it causes | Better repair |
|---|---|---|
| reusable logic trapped in one script | duplication and weak tests | promote the logic into package code under src/ |
| package code reaches into workflow internals casually | ownership becomes muddy | keep file contracts and step wiring in the rule layer |
| wrappers used as black boxes | debugging becomes indirect | review the wrapper contract and document why it is acceptable |
every helper stored under scripts/ forever |
repository shape stops signaling meaning | separate step-local code from reusable software |
| step-local code promoted too early | project structure becomes ceremonial | keep code near the step until reuse or testing need is real |
The explanation a reviewer trusts¶
Strong explanation:
this logic stays in
workflow/scripts/because it is owned by one publication step, but the shared parsing helpers live undersrc/capstone/because they are reusable software that deserve direct tests.
Weak explanation:
we put the important code in
src/and the rest in scripts.
The strong explanation is about ownership and reuse. The weak one is about status.
End-of-page checkpoint¶
Before leaving this page, you should be able to:
- explain when
workflow/scripts/is the right home for code - explain when code should move into
src/ - describe what a wrapper should clarify rather than hide
- explain why repository layout is part of clarity and reviewability