Skip to content

Exercises

Page Maps

graph LR
  family["Reproducible Research"]
  program["Deep Dive Snakemake"]
  section["Dynamic Dags Discovery Integrity"]
  page["Exercises"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone
flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Use these after reading the five core lessons and the worked example. The goal is to make your reasoning visible, not to prove that you can write the cleverest Snakefile.

Each answer should show three things:

  • the workflow fact you are defending
  • the evidence route you would use
  • the repair or design choice that follows

Exercise 1: Turn ambient discovery into a declared contract

Start from a tiny workflow that scans data/raw/ directly at parse time.

Repair it so the workflow has one named discovery surface and one durable discovered-set artifact.

What to hand in:

  • the original discovery shape and why it was weak
  • the repaired discovery rule or helper
  • the path of the discovered-set artifact
  • one sentence explaining why another reviewer could trust the new design more

Exercise 2: Prevent one accidental fanout explosion

Design a small example where expand() creates more targets than the real domain calls for.

Then repair it so the target list comes from one validated record structure instead of two independent lists.

What to hand in:

  • the buggy expansion
  • the unintended targets it creates
  • the repaired target-list design
  • one command, such as snakemake -n, that would make the difference obvious

Exercise 3: Justify one checkpoint and reject one fake one

Write two short design sketches:

  • one workflow shape where a checkpoint is justified
  • one workflow shape where a checkpoint would only hide weak modeling

What to hand in:

  • the discovery question each sketch is trying to answer
  • the output artifact of the justified checkpoint
  • the cleaner alternative for the unjustified checkpoint
  • one sentence explaining the difference between the two cases

Exercise 4: Design the integrity trail for a dynamic run

Assume a workflow discovers samples and publishes a versioned bundle.

Describe the minimum artifact set that would let a downstream reviewer answer:

  • which samples were discovered
  • which files are part of the public boundary
  • what run identity produced that boundary

What to hand in:

  • one discovery artifact path
  • one provenance artifact path
  • one manifest or inventory artifact path
  • one short explanation of what each artifact proves

Exercise 5: Improve performance without weakening truth

Start from a workflow shape that is operationally clumsy, for example:

  • too many tiny per-fragment jobs
  • too many almost-identical environment files

Repair it without hiding required artifacts or changing the meaning of the publish boundary.

What to hand in:

  • the original smell
  • the repaired environment or job-boundary design
  • one sentence explaining why the repair lowers overhead
  • one sentence explaining why workflow truth stayed intact

Mastery standard for this exercise set

Across all five answers, Module 02 wants the same habits:

  • you name the artifact or boundary that carries the truth
  • you explain dynamic behavior without using mystical language
  • you show which command or file would let another person verify the claim

If your answer says only "Snakemake will handle it," keep going.