Skip to content

Flagship Benchmark Assets

The flagship benchmark packages are no longer treated like disposable test fixtures. They live under one product-owned root:

packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/

The punishing companion surface now lives beside them under:

packages/bijux-proteomics-core/benchmark-assets/flagship-challenge-corpora/

This handbook explains what is copied, what is generated, what public sources justify the copied files, how to rebuild the package metadata, and where the repository keeps the current freshness and obsolescence pressure visible.

The direct audit and lineage follow-up pages are:

What Exists Per Package

Each flagship package root now carries:

  • copied evidence snapshots or follow-up packets that the benchmark actually reviews
  • source_locator_manifest.json
  • citation_manifest.json
  • generated_boundary.json
  • rebuild_instructions.md
  • package_manifest.json
  • artifact_inventory.json
  • quality_sheet.json
  • lifecycle.json

The DDA package also carries:

  • scientific_invariants.json
  • warning_demonstrations.json

Shared Asset Governance Files

The root-level support files are:

  • packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/asset_root_contract.json
  • packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/freshness_report.json
  • packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/obsolescence_audit.json

Use them for different questions:

  • asset_root_contract.json: what each package root must contain and why it is allowed to exist
  • freshness_report.json: whether the copied snapshots are still present and whether the public source pages are still reachable
  • obsolescence_audit.json: whether the package is still scientifically too weak to count as a stable end state

How To Rebuild

When copied evidence, package metadata, or root-level reports need to be refreshed, run:

uv run --group dev python -m bijux_proteomics.benchmarks.flagship_asset_maintenance refresh

That command rewrites:

  • the shared asset-root contract
  • the freshness report
  • the obsolescence audit
  • each package root's source locator manifest
  • each package root's citation manifest
  • each package root's generated-boundary manifest
  • each package root's rebuild instructions
  • each package root's package manifest, artifact inventory, quality sheet, and lifecycle record

Current Scientific Limits

These package roots are stronger than the old fixture-only posture, but they do not yet close the scientific gap across all workflow families.

  • dda: outsider-auditable, but still built around imported-result snapshots instead of live in-repo search reruns
  • dia: publicly inspectable, but still library-conditioned and import-backed
  • lfq: runtime-backed, but still blocked on stronger comparator and generalization proof
  • multiplex: runtime-backed, but still thin on lab consequence and broader authority
  • ptm: runtime-backed, but still blocked on stronger comparator and PTM-family breadth
  • targeted: public and consequence-bearing, but still import-only and not yet calibration-strong enough for decision-grade trust

Those limits are now pressure-tested explicitly through the blinded holdout and perturbation roots in the Flagship Challenge Corpus Catalog.

First Proof Check

  • packages/bijux-proteomics-core/src/bijux_proteomics/benchmarks/flagship_asset_roots.py
  • packages/bijux-proteomics-core/src/bijux_proteomics/benchmarks/flagship_asset_maintenance.py
  • packages/bijux-proteomics-core/tests/benchmarks/test_flagship_asset_root_surface.py
  • packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages

Challenge Pressure

Open Flagship Challenge Corpus Catalog when you need the frozen holdouts and adversarial perturbations that challenge these public package claims directly.