Skip to content

Exercises

Page Maps

graph LR
  family["Reproducible Research"]
  program["Deep Dive Snakemake"]
  section["Software Boundaries Reproducible Rules"]
  page["Exercises"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone
flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Use these exercises to practice the judgments behind the module, not only the vocabulary.

The strongest answers will name ownership, runtime, and rebuild implications clearly.

Exercise 1: Decide what stays in the rule

You inherit a rule with a 90-line run: block that:

  • reads one declared input file
  • performs several data transformations
  • writes one declared output file
  • contains parsing logic that would also help another reporting step later

Write a short redesign note that explains:

  • what should stay in the rule
  • what should move into workflow/scripts/
  • what should move into src/

Your answer should justify each placement decision.

Exercise 2: Choose the right runtime boundary

A step uses a Python script that imports two libraries not guaranteed to exist on a fresh machine. The team also wants contributors to have a stable way to run Snakemake and edit the repository. Later, the workflow may need to move onto stricter infrastructure where host packages cannot be trusted.

Explain the role of:

  • a rule-scoped environment
  • the repository-level environment.yaml
  • a container definition

Your answer should make clear why these three surfaces are related but not interchangeable.

Exercise 3: Diagnose a hidden dependency

A rule declares:

  • input: results/sample.tsv
  • output: publish/v1/report.json

The script launched by the rule also reads config/report-style.yaml, but that file is not declared anywhere in the rule.

Explain:

  • why this is a software-boundary problem rather than a minor implementation detail
  • what risks it creates for review and rebuild behavior
  • how you would repair the design

Exercise 4: Review a wrapper adoption decision

Your team wants to replace a visible shell command with a wrapper. The wrapper shortens the rule, but nobody on the team can yet explain:

  • which external tool version it assumes
  • which extra runtime requirements it introduces
  • whether it hides meaningful file relationships

Write a review comment that argues for either delaying or accepting the wrapper adoption.

Your answer should focus on clarity and ownership rather than style preference.

Exercise 5: Plan a rebuild after software drift

After publication outputs are generated, the team changes:

  • a helper function under src/capstone/
  • workflow/envs/python.yaml
  • nothing in the declared input datasets

Describe:

  • why the published outputs may no longer be trustworthy
  • what evidence you would want before approving the outputs again
  • what provenance information should accompany the rebuilt publication artifacts

Mastery check

You have a strong grasp of this module if you can answer all five exercises while keeping four ideas visible:

  • the rule owns the file contract
  • software placement expresses ownership
  • runtime declarations shape workflow meaning
  • provenance is part of publication trust