Retention Policy and History Value¶

Page Maps¶

graph LR
  family["Reproducible Research"]
  program["Deep Dive DVC"]
  section["Recovery Scale Incident Survival"]
  page["Retention Policy and History Value"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

Retention policy answers a hard question:

Which historical states are worth keeping recoverable, and for how long?

Without that answer, teams usually drift into one of two bad habits:

keep everything forever until cost or clutter becomes intolerable
delete aggressively and discover later that important evidence is gone

Module 08 asks for a more deliberate middle path.

History does not all have the same value¶

Different states deserve different treatment.

State	Typical value	Possible retention
promoted release	audit, downstream trust, rollback	long-lived or protected
current mainline	collaboration and CI	must stay restorable
published analysis	scientific or stakeholder evidence	policy-driven retention
active experiment candidate	short-term review	bounded review window
abandoned exploratory output	learning only	short retention or discard

The table is not universal. It is a way to make the decision explicit.

A retention rule needs three parts¶

A useful retention rule says:

what state it covers
how long it should remain recoverable
who can approve deletion or archive movement

Example:

release_artifacts:
  recoverability: protected
  deletion: requires release owner approval
  evidence: manifest, params, metrics, dvc lock

experiment_candidates:
  recoverability: review window
  deletion: allowed after review decision
  evidence: experiment note and comparison table

The syntax is illustrative. The important part is the decision shape.

Cost is real, but so is audit loss¶

Storage cost is a legitimate pressure. Pretending otherwise produces policies nobody will follow.

But cost alone is not a safe deletion criterion. The question should be:

What decision, audit, rollback, or downstream use would break if this state disappeared?

If the answer is "nothing," deletion may be safe. If the answer is "we could not explain the promoted result anymore," the state needs stronger retention.

Retention policy should match published promises¶

If a project publishes publish/v1/metrics.json, publish/v1/params.yaml, and a manifest, the supporting artifacts should remain recoverable for as long as that release matters.

A release bundle without recoverable backing becomes a fragile report.

Ask:

can we restore the data or outputs behind the release?
can we explain the parameters and metrics later?
can we identify which DVC state supports the bundle?
can a downstream reader audit the release without private context?

Retention is how those promises survive time.

Review checkpoint¶

You understand this core when you can:

classify historical states by value
write a retention rule with scope, duration, and approval
explain why cost alone is not enough for deletion
connect retention to published release promises
identify which old states can safely expire

Retention policy is not hoarding. It is deciding which history still carries obligations.