Skip to content

bijux-core

Failure Recovery

bijux/bijux-core

Failure Recovery¶

Failure recovery in DAG should preserve evidence first, then restore a runnable state with clear attribution.

Visual Summary¶

flowchart TD
    fail[detect run failure] --> capture[capture status and artifacts]
    capture --> classify[classify root cause scope]
    classify --> remediate[apply targeted remediation]
    remediate --> replay[replay and diff verification]

Recovery Sequence¶

record run status and retain failing artifact directory
classify failure as graph, input, runtime, environment, or backend issue
remediate one scope at a time and rerun
replay the recovered run to verify determinism behavior
diff against last known good run before promotion

Diagnostic Commands¶

bijux dag status ./runs/failed-20260406-01
bijux dag inspect ./runs/failed-20260406-01
bijux dag replay ./runs/failed-20260406-01 --out ./runs/replay-failed
bijux dag diff ./runs/good-20260405-77 ./runs/recovered-20260406-02 --mode semantic --explain

Code Anchors¶

crates/bijux-dag-app/src/routes/status_routes.rs
crates/bijux-dag-app/src/routes/inspect_routes.rs
crates/bijux-dag-runtime/src/replay/

Recovery Boundaries¶

never replace failing evidence in-place
never classify unknown mismatch as success
never skip replay or diff after high-impact remediation

Next Reads¶