Exercise Answers¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive Snakemake"]
section["Dynamic Dags Discovery Integrity"]
page["Exercise Answers"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
Use this page after you have written your own answers. The point is comparison, not copying.
The strongest Module 02 answers usually do four things:
- they name the discovery or integrity boundary directly
- they explain why the current shape is weak or strong
- they point to one evidence route
- they describe the repair in terms of reviewable artifacts, not workflow folklore
Answer 1: Turn ambient discovery into a declared contract¶
A strong answer sounds like this:
the weak version scanned
data/raw/at parse time and immediately expanded targets from that list, so the workflow had no durable record of what it discovered. The repair gives discovery one owned output such asresults/discovered_samples.json, and downstream rules fan out from that artifact. Another reviewer can now inspect the exact discovered set instead of trusting a transient directory scan.
Why this is strong:
- it names the old weakness clearly
- it adds one owned artifact instead of vague "better organization"
- it explains why review got easier
Answer 2: Prevent one accidental fanout explosion¶
A strong answer makes the multiplication visible.
Example answer shape:
- buggy design:
expand("results/{sample}/{panel}/report.json", sample=SAMPLES, panel=PANELS)- unintended effect:
- every sample is paired with every panel, even when the data model is one panel per sample
- repair:
- build targets from validated records such as
{"sample": "sampleA", "panel": "dna"} - evidence:
snakemake -nshows four planned jobs before the repair and two after it
Why this is strong:
- it explains the wrong relationship, not just the wrong count
- it keeps one data source of truth for paired values
Answer 3: Justify one checkpoint and reject one fake one¶
A strong answer contrasts the two designs directly.
Example:
- justified checkpoint:
- the workflow cannot know which samples exist until it inspects
data/raw/ - the checkpoint writes
results/discovery/samples.json - downstream targets read that file
- unjustified checkpoint:
- the sample list already lives in config
- using a checkpoint would only hide a target list that was already known
- the cleaner alternative is to expand directly from validated config data
Why this is strong:
- it shows that checkpoints are for genuine new facts
- it rejects the temptation to use checkpoints as a dramatic wrapper for weak modeling
Answer 4: Design the integrity trail for a dynamic run¶
A strong answer gives three explicit paths and one explicit role for each:
- discovery artifact:
results/discovered_samples.json- proves which sample set drove the run
- provenance artifact:
publish/v1/provenance.json- proves the resolved run identity and materialized config
- manifest artifact:
publish/v1/manifest.json- proves which files belong to the public boundary
The strongest answers also mention a published copy of the discovery artifact, such as
publish/v1/discovered_samples.json, because downstream review often needs the same fact.
Answer 5: Improve performance without weakening truth¶
A strong answer sounds like this:
the original workflow created one environment file per nearly identical Python rule and one trivial job per tiny fragment, so setup overhead dominated useful work. The repair reuses one shared Python environment for the related rules and batches work at the sample boundary. This lowers environment solve and launch cost without hiding required artifacts or changing the meaning of the publish boundary.
Why this is strong:
- it names the overhead source
- it repairs the boundary instead of deleting evidence
- it explicitly says why workflow truth stayed intact
What all five answers should have in common¶
The best Module 02 answers usually:
- explain dynamic behavior without mystical language
- identify one artifact or boundary that carries the truth
- point to a concrete proof route such as
snakemake -n, a discovered-set file, or a publish manifest - describe repairs as stronger contracts, not just shorter code
If your answers do those four things, you are learning the module in the right direction.