Skip to content

Exercise Answers

Page Maps

graph LR
  family["Reproducible Research"]
  program["Deep Dive Snakemake"]
  section["Workflow Architecture File Apis"]
  page["Exercise Answers"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone
flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

These answers are model explanations, not the only acceptable wording.

What matters is whether the reasoning makes repository ownership and review surfaces clearer.

Answer 1: Read the repository entrypoint

What belongs in the entrypoint:

  • config loading and validation
  • visible workflow assembly
  • stable defaults
  • the default target or public target surface

What should probably move elsewhere:

  • the 150-line run: block

Why this is an architecture issue:

The entrypoint should announce the workflow shape. Once it starts hiding large implementation logic, the repository entry surface becomes less useful for review and onboarding.

Answer 2: Judge a rule split

Why the split is weak:

  • the filenames do not communicate ownership or workflow concern
  • the split may reduce line count without improving architectural meaning

What kind of boundary would be stronger:

  • rule families grouped by coherent workflow concern, such as preprocessing, summarization, publishing, or another clearly named domain

What a reviewer should infer from names:

  • what the file owns
  • which rules belong there
  • how it relates to the visible workflow story

The key problem is not the number of files. It is the lack of an ownership signal.

Answer 3: Review a file-API gap

What architecture problem this creates:

  • path promises are real, but they are undocumented
  • downstream and workflow-facing boundaries blur together

Risks:

  • refactors break notebooks and tests invisibly
  • maintainers cannot tell whether a path rename is a contract change
  • consumers depend on internal layout by accident

What to add first:

  • a file API or contract document that distinguishes workflow-facing paths from public publish-facing paths

The goal is to make path promises reviewable instead of implied.

Answer 4: Diagnose hidden coupling

A strong review comment would say:

This is not only a local code-quality problem. The helper changes repository behavior through undeclared files, unvalidated config, and import-time side effects, which means the visible rule and config surfaces no longer tell the full truth. That weakens the architecture because reviewers must inspect hidden layers to understand workflow meaning.

Why:

  • architecture depends on visible boundaries staying trustworthy
  • hidden dependencies move meaning away from the declared repository surfaces

Answer 5: Decide whether to refactor

Architecture signals:

  • onboarding questions keep repeating
  • ownership boundaries are no longer obvious
  • path contracts and code-placement rules are not visible enough

What boundary to inspect first:

  • the entrypoint and the contract docs

Why:

  • if reviewers cannot find the assembly point or the path promises, the rest of the architecture will feel arbitrary too

What kind of refactor is justified:

  • clarify the top-level assembly
  • rename or regroup rule families around ownership
  • strengthen file-API or contract docs
  • make the split between workflow/scripts/ and src/ easier to explain

The refactor should improve reviewability, not only folder appearance.

Self-check

If your answers consistently explain:

  • who owns orchestration
  • where path promises live
  • where reusable code belongs
  • what makes a refactor honest rather than cosmetic

then you are using the module correctly.