Skip to content

Exercise Answers

Page Maps

graph LR
  family["Reproducible Research"]
  program["Deep Dive Snakemake"]
  section["Software Boundaries Reproducible Rules"]
  page["Exercise Answers"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone
flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

These answers are written as model explanations, not as the only acceptable wording.

The standard to aim for is clear ownership reasoning.

Answer 1: Decide what stays in the rule

What should stay in the rule:

  • declared inputs and outputs
  • any simple parameter wiring that helps a reviewer understand file meaning
  • the visible execution boundary for the step

What should move into workflow/scripts/:

  • the non-trivial transformation logic owned by this step
  • code that makes the rule hard to read but is still tightly coupled to this step

What should move into src/:

  • parsing logic that another reporting step will also need
  • reusable helpers that deserve direct tests outside Snakemake

Why:

The rule should remain the place where the file contract is obvious. Step-local implementation belongs in a script. Reusable logic belongs in the repository's package surface.

Answer 2: Choose the right runtime boundary

Rule-scoped environment:

  • declares the software needed by one step or a small cluster of closely related steps
  • keeps the runtime boundary close to the rule that depends on it

Repository-level environment.yaml:

  • gives contributors and workflow runners a shared baseline for working with the project
  • supports authoring and whole-repository execution setup

Container definition:

  • provides a stronger machine-level boundary when host differences are risky
  • helps when OS tools, compiled dependencies, or infrastructure portability matter

Why they are not interchangeable:

They all describe software, but they protect different scopes. A repository environment is too broad to explain every step. A rule-scoped environment is too narrow to replace a full machine boundary. A container solves portability problems an environment file alone cannot.

Answer 3: Diagnose a hidden dependency

Why it is a software-boundary problem:

  • the visible rule contract is smaller than the real behavior
  • a reviewer cannot trust the declared execution story

Risks:

  • the file graph is misleading
  • rebuild behavior may ignore a meaningful input
  • publication review becomes weaker because software behavior is partially hidden

Repair:

  • declare config/report-style.yaml in the rule if it materially affects the output
  • keep the script as implementation, not as a place to invent undeclared dependencies

The core principle is simple: meaningful inputs should stay visible at the rule boundary.

Answer 4: Review a wrapper adoption decision

A strong review comment would delay adoption for now:

I would not merge this wrapper change yet. The shorter rule is not enough benefit unless we can explain the wrapper's tool version assumptions, runtime requirements, and visible file contract. Right now the wrapper reduces local readability but also reduces review confidence. If we can document what it runs and why that contract is acceptable, then the wrapper may become a net improvement.

Why this is the right reasoning:

  • the question is whether the wrapper increases clarity
  • if it hides behavior the team cannot review, it weakens the software boundary

Answer 5: Plan a rebuild after software drift

Why outputs may no longer be trustworthy:

  • helper code under src/capstone/ can change output meaning even with unchanged inputs
  • a runtime change in workflow/envs/python.yaml can alter execution behavior

Evidence to request before approval:

  • confirmation that affected outputs were rebuilt
  • provenance showing the new repository revision and runtime context
  • a review explanation connecting the software change to the rebuilt artifacts

What provenance should accompany the rebuilt outputs:

  • repository revision or release identifier
  • relevant runtime declaration or software versions
  • execution timestamp
  • any workflow configuration that materially affects meaning

The main lesson is that unchanged input datasets do not cancel software drift.

Self-check

If your answers consistently explain:

  • who owns the file contract
  • where the implementation belongs
  • which runtime boundary matters
  • what evidence supports rebuilt trust

then you are using the module correctly.