Exercises¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive DVC"]
section["Reproducibility Failures Real Teams"]
page["Exercises"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
Use these exercises to practice failure diagnosis, not just DVC vocabulary.
The strongest answers will describe what is missing from the workflow story before jumping to tools.
Exercise 1: Separate repeatability from reproducibility¶
A teammate says:
I reran the notebook on my laptop and the metric matches last week's result, so the project is reproducible.
Write a short response that explains:
- what this claim does show
- what it does not show yet
- which stronger question a team should ask next
Exercise 2: Find the hidden state¶
A repository contains:
train.pyparams.yamldata/train.csvREADME.mdresults/metrics.csv
During discussion you learn that:
train.csvwas cleaned once by hand- one threshold is sometimes passed via CLI instead of
params.yaml - the original author uses a private Conda environment nobody else has recreated
List at least five inputs or assumptions that should now be treated as part of the real workflow story.
Exercise 3: Name Git's real boundary¶
A team says:
we do not need anything beyond Git because the code, configs, and README are all versioned.
Write a short review note that explains:
- what Git is preserving well
- what important parts of the result story may still lack a clear owner
- why that gap matters for reproducibility
Exercise 4: Draw the DVC boundary honestly¶
Suppose someone on your team says:
DVC will solve our reproducibility problem once we start tracking the data.
Explain:
- what DVC is likely to help with
- what DVC still does not settle by itself
- why this distinction matters before adopting the tool
Exercise 5: Write your first workflow inventory¶
Choose one of your own workflows and write a five-part inventory:
- source inputs
- control inputs
- execution assumptions
- trusted outputs
- weak points and missing evidence
The goal is not to fix the workflow yet. The goal is to describe it honestly enough that a future DVC design would have something real to improve.
Mastery check¶
You have a strong grasp of this module if your answers consistently keep four ideas visible:
- rerunning locally is weaker than team-grade reproducibility
- hidden inputs and assumptions are part of the workflow whether or not they are recorded
- Git's boundary should be respected rather than mythologized
- DVC helps with explicit state and recovery, not every question a workflow team can ask