Exercises¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive DVC"]
section["Truthful Pipelines Declared Dependencies"]
page["Exercises"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
Use these exercises to practice pipeline judgment, not only DVC vocabulary.
The strongest answers will explain the declared graph, the hidden influence risk, and the
evidence you would expect from dvc.lock.
Exercise 1: Read a stage contract¶
You see this stage:
stages:
prepare:
cmd: python -m incident_escalation_capstone.prepare
deps:
- data/raw/service_incidents.csv
outs:
- data/prepared/incidents.parquet
Write a short explanation of:
- what the stage promises
- what changes should make it stale
- what you still need to inspect before trusting the declaration
Exercise 2: Place each influence¶
An evaluation command reads:
models/escalation-model.jsondata/prepared/incidents.parquetdata/reference/escalation_policy.csv- a threshold value from
params.yaml - a temporary log file written during the run
Decide which items belong in deps, which belong in params, which belong in outs, and
which should probably stay outside the output contract.
Explain your reasoning.
Exercise 3: Predict reruns¶
Use this graph:
flowchart LR
prepare["prepare"] --> prepared["prepared incidents"]
prepared --> fit["fit"]
fit --> model["model"]
model --> evaluate["evaluate"]
prepared --> evaluate
Predict what should rerun when:
- the raw incidents file changes and
prepareproduces a new prepared output fit.model_familychangesevaluate.thresholdchanges- an unrelated README changes
Explain each prediction in terms of declared dependencies and parameters.
Exercise 4: Diagnose stale output risk¶
You review this report:
I changed
data/reference/escalation_policy.csv, randvc repro, and evaluation did not rerun. But the evaluation command uses that file.
Write a review response that explains:
- the likely graph problem
- why this is more dangerous than an extra rerun
- the concrete declaration repair
- what lock evidence should show after the repair and rerun
Exercise 5: Refactor a mixed stage¶
A stage currently fits a model and evaluates it in one command:
stages:
train_and_evaluate:
cmd: python -m incident_escalation_capstone.train_and_evaluate
deps:
- data/prepared/incidents.parquet
params:
- fit.model_family
- evaluate.threshold
outs:
- models/escalation-model.json
- reports/evaluation.json
Propose either:
- a split into clearer stages, if you think the model is a meaningful intermediate
- a justification for keeping it together, if you think the combined boundary is more honest
Your answer should explain how the refactor affects rerun behavior and review clarity.
Mastery check¶
You have a strong grasp of this module if your answers consistently keep five ideas visible:
- a stage contract is a reviewable promise, not only a command
- file reads and control values belong in different declaration fields
dvc reproonly reacts to declared influence- stale outputs are a correctness risk, while false reruns are usually visible waste
- graph refactoring is safe when it preserves the provenance story