Entrypoints, Repository Layers, and Visible Assembly¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive Snakemake"]
section["Workflow Architecture File Apis"]
page["Entrypoints, Repository Layers, and Visible Assembly"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
The first architecture question in a Snakemake repository is simple:
where should a reviewer look first to understand the workflow?
If that question does not have a clear answer, the repository is already asking too much from memory and too little from structure.
This page is about making the entrypoint and layer responsibilities visible.
The top-level entrypoint should announce the workflow shape¶
In the capstone, the top-level Snakefile does several important things:
- loads and validates config
- sets stable defaults
- assembles the workflow from named rule files
- defines the entrypoint target
That is a strong job for the entrypoint.
It tells a reviewer:
- where workflow assembly starts
- which files contribute rules
- which boundaries are visible at the top level
The Snakefile should feel like a routing and assembly surface, not a hiding place.
A repository becomes clearer when layers have jobs¶
The capstone architecture already points toward a healthy split:
Snakefilefor visible assemblyworkflow/rules/for rule families and declared file contractsworkflow/modules/for reusable workflow bundlesworkflow/scripts/for workflow-adjacent implementationsrc/capstone/for reusable package codeprofiles/for execution policyconfig/for input configuration and schema boundaries
These folders are useful only when their ownership stays legible.
Layering is not about folder aesthetics¶
A repository layer matters when it answers a review question quickly.
For example:
- “where is the workflow assembled?” points to
Snakefile - “where does this rule family live?” points to
workflow/rules/ - “where is reusable implementation code?” points to
src/capstone/ - “where are path promises documented?” points to the file API and contract docs
If a folder exists but does not answer any boundary question, it may be decorative rather than architectural.
One useful architecture map¶
flowchart TD
snakefile["Snakefile"] --> rules["workflow/rules/"]
snakefile --> modules["workflow/modules/"]
rules --> scripts["workflow/scripts/"]
rules --> contracts["workflow/contracts/FILE_API.md"]
scripts --> package["src/capstone/"]
snakefile --> config["config/ and schema"]
snakefile --> profiles["profiles/ as policy"]
This diagram matters because it shows the repository as a set of named boundaries rather than a pile of adjacent directories.
A weak entrypoint¶
Weak shape:
- the
Snakefilecontains large implementation blocks - rule inclusion happens with no clear naming or ownership reason
- important defaults are scattered across helper files most reviewers never read
This makes the repository harder to enter and easier to misunderstand.
A stronger entrypoint¶
Stronger shape:
- keep the top-level file focused on assembly, defaults, and visible targets
- use named includes or modules that match coherent rule families
- keep implementation code and helper logic in their own owned layers
Now a reviewer can explain the repository in the same order they discover it.
A practical test¶
Ask these questions:
- Can a new reviewer locate the workflow assembly point quickly?
- Can they tell which folders are for orchestration, implementation, policy, and contracts?
- Does the entrypoint reveal more than it hides?
If those answers depend on prior oral explanation, the architecture is already weaker than it should be.
Common failure modes¶
| Failure mode | What goes wrong | Better repair |
|---|---|---|
Snakefile becomes a giant monolith |
assembly and implementation blur together | keep orchestration visible and move owned logic outward |
| folders exist with no clear responsibility | readers browse without a mental map | assign each layer one reviewable job |
| config, policy, and workflow logic mix together | architecture questions become semantic questions | separate meaning, policy, and configuration surfaces clearly |
| helper code becomes easier to find than rules | the visible DAG stops being the first story | keep rule assembly easier to inspect than helper internals |
| entrypoint only works for insiders | onboarding depends on oral tradition | make the top-level file teach the repository layout directly |
The explanation a reviewer trusts¶
Strong explanation:
the
Snakefileassembles the workflow and points to the owned rule families, while the repository layers separate orchestration, implementation, contracts, configuration, and policy so a reviewer can inspect one boundary at a time.
Weak explanation:
the repository is organized into folders, and the important parts are spread around.
The strong explanation describes ownership. The weak one describes geography.
End-of-page checkpoint¶
Before leaving this page, you should be able to:
- explain what the top-level
Snakefileshould own - describe why repository layers need named responsibilities
- explain how visible assembly helps code review and onboarding
- identify one sign that a repository entrypoint is hiding too much