Entrypoints, Repository Layers, and Visible Assembly¶

Page Maps¶

graph LR
  family["Reproducible Research"]
  program["Deep Dive Snakemake"]
  section["Workflow Architecture File Apis"]
  page["Entrypoints, Repository Layers, and Visible Assembly"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

The first architecture question in a Snakemake repository is simple:

where should a reviewer look first to understand the workflow?

If that question does not have a clear answer, the repository is already asking too much from memory and too little from structure.

This page is about making the entrypoint and layer responsibilities visible.

The top-level entrypoint should announce the workflow shape¶

In the capstone, the top-level Snakefile does several important things:

loads and validates config
sets stable defaults
assembles the workflow from named rule files
defines the entrypoint target

That is a strong job for the entrypoint.

It tells a reviewer:

where workflow assembly starts
which files contribute rules
which boundaries are visible at the top level

The Snakefile should feel like a routing and assembly surface, not a hiding place.

A repository becomes clearer when layers have jobs¶

The capstone architecture already points toward a healthy split:

Snakefile for visible assembly
workflow/rules/ for rule families and declared file contracts
workflow/modules/ for reusable workflow bundles
workflow/scripts/ for workflow-adjacent implementation
src/capstone/ for reusable package code
profiles/ for execution policy
config/ for input configuration and schema boundaries

These folders are useful only when their ownership stays legible.

Layering is not about folder aesthetics¶

A repository layer matters when it answers a review question quickly.

For example:

“where is the workflow assembled?” points to Snakefile
“where does this rule family live?” points to workflow/rules/
“where is reusable implementation code?” points to src/capstone/
“where are path promises documented?” points to the file API and contract docs

If a folder exists but does not answer any boundary question, it may be decorative rather than architectural.

One useful architecture map¶

flowchart TD
  snakefile["Snakefile"] --> rules["workflow/rules/"]
  snakefile --> modules["workflow/modules/"]
  rules --> scripts["workflow/scripts/"]
  rules --> contracts["workflow/contracts/FILE_API.md"]
  scripts --> package["src/capstone/"]
  snakefile --> config["config/ and schema"]
  snakefile --> profiles["profiles/ as policy"]

This diagram matters because it shows the repository as a set of named boundaries rather than a pile of adjacent directories.

A weak entrypoint¶

Weak shape:

the Snakefile contains large implementation blocks
rule inclusion happens with no clear naming or ownership reason
important defaults are scattered across helper files most reviewers never read

This makes the repository harder to enter and easier to misunderstand.

A stronger entrypoint¶

Stronger shape:

keep the top-level file focused on assembly, defaults, and visible targets
use named includes or modules that match coherent rule families
keep implementation code and helper logic in their own owned layers

Now a reviewer can explain the repository in the same order they discover it.

A practical test¶

Ask these questions:

Can a new reviewer locate the workflow assembly point quickly?
Can they tell which folders are for orchestration, implementation, policy, and contracts?
Does the entrypoint reveal more than it hides?

If those answers depend on prior oral explanation, the architecture is already weaker than it should be.

Common failure modes¶

Failure mode	What goes wrong	Better repair
`Snakefile` becomes a giant monolith	assembly and implementation blur together	keep orchestration visible and move owned logic outward
folders exist with no clear responsibility	readers browse without a mental map	assign each layer one reviewable job
config, policy, and workflow logic mix together	architecture questions become semantic questions	separate meaning, policy, and configuration surfaces clearly
helper code becomes easier to find than rules	the visible DAG stops being the first story	keep rule assembly easier to inspect than helper internals
entrypoint only works for insiders	onboarding depends on oral tradition	make the top-level file teach the repository layout directly

The explanation a reviewer trusts¶

Strong explanation:

the Snakefile assembles the workflow and points to the owned rule families, while the repository layers separate orchestration, implementation, contracts, configuration, and policy so a reviewer can inspect one boundary at a time.

Weak explanation:

the repository is organized into folders, and the important parts are spread around.

The strong explanation describes ownership. The weak one describes geography.

End-of-page checkpoint¶

Before leaving this page, you should be able to:

explain what the top-level Snakefile should own
describe why repository layers need named responsibilities
explain how visible assembly helps code review and onboarding
identify one sign that a repository entrypoint is hiding too much