Exercise Answers¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive Snakemake"]
section["Software Boundaries Reproducible Rules"]
page["Exercise Answers"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
These answers are written as model explanations, not as the only acceptable wording.
The standard to aim for is clear ownership reasoning.
Answer 1: Decide what stays in the rule¶
What should stay in the rule:
- declared inputs and outputs
- any simple parameter wiring that helps a reviewer understand file meaning
- the visible execution boundary for the step
What should move into workflow/scripts/:
- the non-trivial transformation logic owned by this step
- code that makes the rule hard to read but is still tightly coupled to this step
What should move into src/:
- parsing logic that another reporting step will also need
- reusable helpers that deserve direct tests outside Snakemake
Why:
The rule should remain the place where the file contract is obvious. Step-local implementation belongs in a script. Reusable logic belongs in the repository's package surface.
Answer 2: Choose the right runtime boundary¶
Rule-scoped environment:
- declares the software needed by one step or a small cluster of closely related steps
- keeps the runtime boundary close to the rule that depends on it
Repository-level environment.yaml:
- gives contributors and workflow runners a shared baseline for working with the project
- supports authoring and whole-repository execution setup
Container definition:
- provides a stronger machine-level boundary when host differences are risky
- helps when OS tools, compiled dependencies, or infrastructure portability matter
Why they are not interchangeable:
They all describe software, but they protect different scopes. A repository environment is too broad to explain every step. A rule-scoped environment is too narrow to replace a full machine boundary. A container solves portability problems an environment file alone cannot.
Answer 3: Diagnose a hidden dependency¶
Why it is a software-boundary problem:
- the visible rule contract is smaller than the real behavior
- a reviewer cannot trust the declared execution story
Risks:
- the file graph is misleading
- rebuild behavior may ignore a meaningful input
- publication review becomes weaker because software behavior is partially hidden
Repair:
- declare
config/report-style.yamlin the rule if it materially affects the output - keep the script as implementation, not as a place to invent undeclared dependencies
The core principle is simple: meaningful inputs should stay visible at the rule boundary.
Answer 4: Review a wrapper adoption decision¶
A strong review comment would delay adoption for now:
I would not merge this wrapper change yet. The shorter rule is not enough benefit unless we can explain the wrapper's tool version assumptions, runtime requirements, and visible file contract. Right now the wrapper reduces local readability but also reduces review confidence. If we can document what it runs and why that contract is acceptable, then the wrapper may become a net improvement.
Why this is the right reasoning:
- the question is whether the wrapper increases clarity
- if it hides behavior the team cannot review, it weakens the software boundary
Answer 5: Plan a rebuild after software drift¶
Why outputs may no longer be trustworthy:
- helper code under
src/capstone/can change output meaning even with unchanged inputs - a runtime change in
workflow/envs/python.yamlcan alter execution behavior
Evidence to request before approval:
- confirmation that affected outputs were rebuilt
- provenance showing the new repository revision and runtime context
- a review explanation connecting the software change to the rebuilt artifacts
What provenance should accompany the rebuilt outputs:
- repository revision or release identifier
- relevant runtime declaration or software versions
- execution timestamp
- any workflow configuration that materially affects meaning
The main lesson is that unchanged input datasets do not cancel software drift.
Self-check¶
If your answers consistently explain:
- who owns the file contract
- where the implementation belongs
- which runtime boundary matters
- what evidence supports rebuilt trust
then you are using the module correctly.