Skip to content

Retention Policy and History Value

Page Maps

graph LR
  family["Reproducible Research"]
  program["Deep Dive DVC"]
  section["Recovery Scale Incident Survival"]
  page["Retention Policy and History Value"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone
flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Retention policy answers a hard question:

Which historical states are worth keeping recoverable, and for how long?

Without that answer, teams usually drift into one of two bad habits:

  • keep everything forever until cost or clutter becomes intolerable
  • delete aggressively and discover later that important evidence is gone

Module 08 asks for a more deliberate middle path.

History does not all have the same value

Different states deserve different treatment.

State Typical value Possible retention
promoted release audit, downstream trust, rollback long-lived or protected
current mainline collaboration and CI must stay restorable
published analysis scientific or stakeholder evidence policy-driven retention
active experiment candidate short-term review bounded review window
abandoned exploratory output learning only short retention or discard

The table is not universal. It is a way to make the decision explicit.

A retention rule needs three parts

A useful retention rule says:

  • what state it covers
  • how long it should remain recoverable
  • who can approve deletion or archive movement

Example:

release_artifacts:
  recoverability: protected
  deletion: requires release owner approval
  evidence: manifest, params, metrics, dvc lock

experiment_candidates:
  recoverability: review window
  deletion: allowed after review decision
  evidence: experiment note and comparison table

The syntax is illustrative. The important part is the decision shape.

Cost is real, but so is audit loss

Storage cost is a legitimate pressure. Pretending otherwise produces policies nobody will follow.

But cost alone is not a safe deletion criterion. The question should be:

What decision, audit, rollback, or downstream use would break if this state disappeared?

If the answer is "nothing," deletion may be safe. If the answer is "we could not explain the promoted result anymore," the state needs stronger retention.

Retention policy should match published promises

If a project publishes publish/v1/metrics.json, publish/v1/params.yaml, and a manifest, the supporting artifacts should remain recoverable for as long as that release matters.

A release bundle without recoverable backing becomes a fragile report.

Ask:

  • can we restore the data or outputs behind the release?
  • can we explain the parameters and metrics later?
  • can we identify which DVC state supports the bundle?
  • can a downstream reader audit the release without private context?

Retention is how those promises survive time.

Review checkpoint

You understand this core when you can:

  • classify historical states by value
  • write a retention rule with scope, duration, and approval
  • explain why cost alone is not enough for deletion
  • connect retention to published release promises
  • identify which old states can safely expire

Retention policy is not hoarding. It is deciding which history still carries obligations.