Exercises¶

Page Maps¶

graph LR
  family["Reproducible Research"]
  program["Deep Dive Snakemake"]
  section["Dynamic Dags Discovery Integrity"]
  page["Exercises"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Use these after reading the five core lessons and the worked example. The goal is to make your reasoning visible, not to prove that you can write the cleverest Snakefile.

Each answer should show three things:

the workflow fact you are defending
the evidence route you would use
the repair or design choice that follows

Exercise 1: Turn ambient discovery into a declared contract¶

Start from a tiny workflow that scans data/raw/ directly at parse time.

Repair it so the workflow has one named discovery surface and one durable discovered-set artifact.

What to hand in:

the original discovery shape and why it was weak
the repaired discovery rule or helper
the path of the discovered-set artifact
one sentence explaining why another reviewer could trust the new design more

Exercise 2: Prevent one accidental fanout explosion¶

Design a small example where expand() creates more targets than the real domain calls for.

Then repair it so the target list comes from one validated record structure instead of two independent lists.

What to hand in:

the buggy expansion
the unintended targets it creates
the repaired target-list design
one command, such as snakemake -n, that would make the difference obvious

Exercise 3: Justify one checkpoint and reject one fake one¶

Write two short design sketches:

one workflow shape where a checkpoint is justified
one workflow shape where a checkpoint would only hide weak modeling

What to hand in:

the discovery question each sketch is trying to answer
the output artifact of the justified checkpoint
the cleaner alternative for the unjustified checkpoint
one sentence explaining the difference between the two cases

Exercise 4: Design the integrity trail for a dynamic run¶

Assume a workflow discovers samples and publishes a versioned bundle.

Describe the minimum artifact set that would let a downstream reviewer answer:

which samples were discovered
which files are part of the public boundary
what run identity produced that boundary

What to hand in:

one discovery artifact path
one provenance artifact path
one manifest or inventory artifact path
one short explanation of what each artifact proves

Exercise 5: Improve performance without weakening truth¶

Start from a workflow shape that is operationally clumsy, for example:

too many tiny per-fragment jobs
too many almost-identical environment files

Repair it without hiding required artifacts or changing the meaning of the publish boundary.

What to hand in:

the original smell
the repaired environment or job-boundary design
one sentence explaining why the repair lowers overhead
one sentence explaining why workflow truth stayed intact

Mastery standard for this exercise set¶

Across all five answers, Module 02 wants the same habits:

you name the artifact or boundary that carries the truth
you explain dynamic behavior without using mystical language
you show which command or file would let another person verify the claim

If your answer says only "Snakemake will handle it," keep going.