Debug Bundles¶
Debug bundles are governed artifacts for cluster and runtime diagnosis, not ad hoc tarballs.
Purpose¶
Use a debug bundle when an install, rollout, readiness incident, or conformance failure needs a durable evidence package that another operator can inspect without reconnecting to the cluster.
Source of Truth¶
ops/schema/k8s/ops-debug-bundle.schema.jsonops/k8s/tests/goldens/k8s-conformance-report.sample.jsonops/report/generated/
What Is Governed¶
The schema-backed bundle record captures:
cluster, which is currently expected to bekindnamespace, so evidence stays tied to a concrete surfacecategory, which must be one oflogs,describe,events, orresourcesstatus, which records whether the capture succeededfiles, which lists the collected bundle members
Bundle Contents¶
A complete debug bundle should include the files needed to explain the failure without guesswork:
- pod and workload descriptions for the failing namespace
- recent events around scheduling, probes, and rollout transitions
- selected logs for the affected Atlas workload and supporting jobs
- rendered resources or manifest snapshots that show the intended state
- the matching conformance, validation, or rollout report when one exists
Redaction and Naming Rules¶
- exclude secrets and credentials rather than masking them late
- avoid raw environment dumps that may expose sensitive values
- name files so the category, namespace, and workload are obvious to a later reader
- keep one bundle per incident or validation run so evidence does not blur across unrelated failures
When to Capture a Bundle¶
Capture a debug bundle whenever:
k8s-validatefails and the rendered output does not explain the failure by itself- a rollout stalls, flaps readiness, or triggers rollback review
- a conformance suite reports failing sections that need owner handoff
- an incident needs logs, describe output, and event history preserved before the cluster changes again
How to Validate¶
- Confirm the bundle metadata matches
ops/schema/k8s/ops-debug-bundle.schema.json. - Check that every file listed in
filesactually exists in the captured evidence set. - Verify no sensitive content escaped redaction rules.
- Link the bundle to the relevant rollout, conformance, or incident record.
Failure Modes¶
- bundle capture happens too late and the useful events have rotated away
- operators collect logs only and miss the rendered or describe evidence
- sensitive content enters the bundle and blocks distribution
- bundle filenames are generic, making later incident review ambiguous
Evidence Produced¶
The debug bundle itself is the evidence artifact. It should accompany:
- the failing validation or incident identifier
- the namespace and cluster used during capture
- the status of the capture attempt
- references to the rollout, conformance, or report outputs that explain why it was collected
Related Contracts and Assets¶
ops/schema/k8s/ops-debug-bundle.schema.jsonops/report/generated/ops/k8s/tests/goldens/k8s-conformance-report.sample.json