Module 00: Orientation and Study Practice¶
Module Position¶
flowchart TD
family["Reproducible Research"] --> program["Deep Dive DVC"]
program --> module["Module 00: Orientation and Study Practice"]
module --> lessons["Lesson pages and worked examples"]
module --> checkpoints["Exercises and closing criteria"]
module --> capstone["Related capstone evidence"]
flowchart TD
purpose["Start with the module purpose and main questions"] --> lesson_map["Use the lesson map to choose reading order"]
lesson_map --> study["Read the lessons and examples with one review question in mind"]
study --> proof["Test the idea with exercises and capstone checkpoints"]
proof --> close["Move on only when the closing criteria feel concrete"]
Read the first diagram as a placement map: this page sits between the course promise, the lesson pages listed below, and the capstone surfaces that pressure-test the module. Read the second diagram as the study route for this page, so the diagrams point you toward the Lesson map, Exercises, and Closing criteria instead of acting like decoration.
Deep Dive DVC is now a ten-module program that starts from first-contact reproducibility thinking and ends with long-lived state stewardship. The through-line stays constant:
- Stable identity: data and artifacts are known by what they are, not only where they live.
- Truthful state transitions: pipelines, params, and experiments declare the real change surface.
- Durable evidence: metrics, manifests, locks, and publish bundles make claims reviewable.
- Operational survival: remotes, retention, recovery, and promotion keep state trustworthy over time.
- Stewardship judgment: teams know which state is authoritative and how to migrate it safely.
This repository contains both the program guide in course-book/ and the executable DVC
reference repository in capstone/.
At a Glance¶
| What this course optimizes for | What this course refuses to optimize for |
|---|---|
| explicit state identity | vague claims that a pipeline is "tracked somehow" |
| durable metrics, params, and publish evidence | outputs that only make sense to the original author |
| recovery and promotion discipline | trust based on memory or directory names |
| pedagogy that moves from state model to repository stewardship | dropping the learner into a full repository too early |
Learning outcomes¶
- explain the five-course through-lines: stable identity, truthful state transitions, durable evidence, operational survival, and stewardship judgment
- choose when to enter the capstone so the repository stays a proof surface instead of becoming cognitive noise
- identify which later modules answer state identity, pipeline truth, experimentation, collaboration, recovery, promotion, and governance questions
Verification route¶
- Run
make PROGRAM=reproducible-research/deep-dive-dvc capstone-platform-reportto confirm the supported Python, Git, and DVC versions before you trust later proof routes. - Run
make PROGRAM=reproducible-research/deep-dive-dvc capstone-walkthroughwhen you want the first learner-facing repository tour. - Use
../guides/module-checkpoints.mdto decide whether you are ready to enter the capstone or should stay with the smaller module model first.
Program Arc¶
Module 01: Reproducibility Failures in Real Teams¶
Start from the failure modes that push teams toward DVC in the first place: results that cannot be defended, datasets that drift silently, and metrics that stop meaning what they appear to mean.
Deliverable: a precise explanation of what problem DVC solves and what it does not solve by itself.
Module 02: Data Identity and Content Addressing¶
Learn why paths are only locators and why reproducibility starts with immutable, content-addressed identity across workspace, cache, Git, and remote layers.
Deliverable: a repository that can distinguish location from identity and explain how a datum is recovered.
Module 03: Execution Environments as Reproducible Inputs¶
Move beyond code and data alone. Environments, runtime assumptions, and tool versions become part of the declared input surface rather than invisible luck.
Deliverable: a state story that includes the runtime boundary instead of hand-waving it away.
Module 04: Truthful Pipelines and Declared Dependencies¶
Turn DVC stages into honest state transitions. Dependencies, outputs, params, and lock state become a reviewable graph rather than a convenient script wrapper.
Deliverable: a dvc.yaml pipeline whose stage behavior can be explained and defended under review.
Module 05: Metrics, Parameters, and Comparable Meaning¶
Treat numbers as semantic contracts, not just logged values. Parameters and metrics become first-class state that preserve comparability across time.
Deliverable: a repository whose comparisons remain meaningful instead of only mechanically repeatable.
Module 06: Experiments, Baselines, and Controlled Change¶
Formalize exploration as a controlled, reversible process. Experiments become comparable deviations from a baseline rather than local folklore.
Deliverable: an experiment workflow that allows change without corrupting baseline history.
Module 07: Collaboration, CI, and Social Contracts¶
Make good behavior enforceable across humans. Reviews, remotes, CI gates, and promotion habits become social contracts with technical backing.
Deliverable: a repository where another person can verify trustworthy state without private context.
Module 08: Recovery, Scale, and Incident Survival¶
Design for time as an adversary. Retention, garbage collection, cache loss, remote migration, and recovery drills become part of the system instead of afterthoughts.
Deliverable: a repository that can survive time pressure and still restore authoritative state.
Module 09: Promotion, Registry Boundaries, and Auditability¶
Separate exploratory state from promoted state. Publish surfaces, manifests, params, metrics, and lock evidence become a defendable release contract for downstream users.
Deliverable: a promoted state bundle another reviewer or consumer can validate without guesswork.
Module 10: Migration, Governance, and DVC Boundaries¶
Finish with stewardship judgment: reviewing real repositories, planning migrations, setting governance rules, rejecting recurring anti-patterns, and deciding where DVC should remain authoritative versus where another system should take over.
Deliverable: an evidence-based review and stewardship plan for a real DVC repository.
Study Paths¶
Full course path¶
Use this if you are learning DVC from the ground up.
- Modules 01-02 for failure modes and state identity
- Modules 03-05 for environments, pipelines, params, and metrics
- Modules 06-09 for experiments, collaboration, recovery, and promotion
- Module 10 for review, migration, and governance
Working maintainer path¶
Use this if you already operate a DVC repository.
pressure-routes.mdfor the repair-first route- Module 04 for truthful pipeline behavior
- Module 07 for collaboration and CI contracts
- Module 08 for retention and recovery discipline
- Module 09 for promotion boundaries
- Module 10 for stewardship judgment
Reproducibility steward path¶
Use this if your role is auditability, release, or long-lived system ownership.
- Module 05 for semantic state surfaces
- Module 08 for durability and incident survival
- Module 09 for publish contracts
- Module 10 for migration and governance
module-promise-map.mdfor title-to-deliverable review
Recommended Reading Path¶
- Read Modules 01 to 10 in order.
- Use support pages to keep the course legible instead of treating them as appendix material.
- Use the capstone lightly at first, then heavily from Modules 04 to 09.
- Re-run proof commands as you go instead of trusting prose summaries.
- Treat Module 10 as the finish of the program, not as optional appendix material.
If you are totally new to DVC, spend extra time in Modules 01 and 02 before rushing into pipelines or experiments. If you already use DVC in production, Modules 07 to 10 will be the fastest route to operational value.
Support Pages By Milestone¶
Use these pages when you reach each milestone:
| Milestone | Best support pages | Why these pages matter |
|---|---|---|
| before Module 01 | ../guides/start-here.md, ../guides/learning-contract.md, ../reference/module-dependency-map.md |
establish the learner route and the pedagogical boundaries |
| Modules 01-02 | ../reference/glossary.md, ../reference/authority-map.md |
keep identity, authority, and layer language precise |
| Modules 03-05 | ../reference/practice-map.md, ../capstone/command-guide.md, ../guides/proof-matrix.md |
connect pipeline and metric concepts to executable proof |
| Modules 06-09 | ../capstone/capstone-map.md, ../capstone/capstone-file-guide.md, ../capstone/index.md |
move from concept to repository inspection without losing the teaching thread |
| Module 10 and later review | ../reference/completion-rubric.md, ../capstone/capstone-review-worksheet.md, ../capstone/capstone-extension-guide.md |
assess the course and evolve the repository without weakening the contract |
This keeps the support surfaces on the main learner route instead of making them feel optional.
Capstone Relationship¶
The capstone is strongest as the executable companion to Modules 04 to 09, where truthful pipelines, metrics, experiments, promotion, remotes, and recovery become concrete. The early modules still benefit from smaller mental and local examples first so the learner can understand state identity before the repository becomes the main teaching surface.
Use Capstone Map when you want one clear route from a module concept to the exact repository files and proof command that demonstrate it.
Proof command:
Milestones¶
| Milestone | Modules | What you should be able to do |
|---|---|---|
| State literacy | 01-02 | explain identity, state layers, and why paths are not enough |
| Executable truth | 03-05 | model environments, truthful pipelines, params, and comparable metrics |
| Controlled change | 06-07 | run experiments and collaboration flows without corrupting trust |
| Long-lived trust | 08-10 | recover, promote, review, and govern state over time |
Capstone Timing¶
Enter the capstone at three deliberate moments:
- after Module 02 to inspect state layers and identity boundaries
- after Module 04 or 05 to inspect truthful pipeline and metric surfaces
- after Modules 08-10 to review recovery, promotion, and stewardship choices
If the capstone ever feels larger than the concept you are studying, return to the module and restore the smaller state model first.
Use this page sequence when you enter the capstone:
- Capstone Guide
capstone-map.mdcapstone-file-guide.mdmake PROGRAM=reproducible-research/deep-dive-dvc capstone-walkthroughmake PROGRAM=reproducible-research/deep-dive-dvc capstone-confirmwhen you want the strongest built-in proof route
Use module-checkpoints.md when you are deciding whether the
current module is actually stable enough to justify entering the larger repository.
Keep using the capstone to answer one question: when a result is challenged months later, which exact state can the repository recover, compare, and prove?
Directory glossary¶
Use Glossary when you want the recurring language in this module kept stable while you move between lessons, exercises, and capstone checkpoints.