Data Profile Guide¶

Guide Maps¶

graph LR
  raw["Raw incidents"] --> prepare["prepare.py"]
  prepare --> profile["data-profile.json"]
  profile --> publish["publish/v1/"]
  publish --> review["Release review"]

flowchart LR
  question["What population do these metrics describe?"] --> profile["Read the data profile"]
  profile --> compare["Compare raw, train, eval, and team counts"]
  compare --> review["Only then judge promoted metrics and predictions"]

Use this guide when the promoted metrics feel precise but the population behind them is still blurry. The goal is to make data-profile.json the first place you check what was split, counted, and summarized before trusting evaluation results.

What the data profile owns¶

Field	Meaning	Why it matters
`raw_rows`	total committed incident rows	defines the starting repository population
`train_rows`	rows assigned to training	explains the model-fitting population
`eval_rows`	rows assigned to evaluation	explains the metrics and predictions population
`escalated_rows`	total positive outcomes	gives the outcome prevalence context
`escalation_rate`	positive-rate summary across raw rows	helps interpret accuracy and threshold choices
`feature_names`	declared feature contract	keeps the review tied to the intended model inputs
`teams`	row counts by owning team	keeps review anchored in operational ownership, not only totals

What the data profile should not answer¶

whether the model is good enough for promotion
whether the threshold is appropriate for release
whether the publish bundle alone proves internal provenance

Use make profile-summary when you want the promoted population story rendered into one reviewable summary before opening the raw JSON.

Best companion guides¶

read PUBLISH_CONTRACT.md when the next question is why the profile belongs in publish/v1/
read RELEASE_REVIEW_GUIDE.md when the next question is how to combine profile, metrics, and report review
read CONTROL_SURFACE_GUIDE.md when the next question is whether params changes still keep the profile and metrics comparable