Skip to content

Benchmark Incompleteness Ledger

This ledger records why the current benchmark roots still cap public trust language. It is intentionally repetitive: each package repeats its live blockers, realism limits, failure conditions, and non-transfer zones so a reviewer does not need to infer those limits from prose alone.

Package Summary

workflow family package role quality blockers non-transfer zones
dda primary flagship package 2 2
dda companion generalization package 2 2
dia primary flagship package 2 2
dia companion generalization package 2 2
lfq primary flagship package 2 2
lfq companion generalization package 2 2
multiplex primary flagship package 2 2
multiplex companion generalization package 2 2
ptm primary flagship package 2 2
ptm companion generalization package 2 2
targeted primary flagship package 2 2
targeted companion generalization package 2 2

Live Incompleteness Entries

dda: primary flagship package

  • package id: flagship_public_package:dda_reviewable_run
  • package root: packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/dda_reviewable_run

Quality blockers:

  • no in-repo live-engine rerun parity
  • one-run package cannot authorize broad production-cohort DDA claims

Weakness notes:

  • The tracked DDA package is still smaller and cleaner than a production multi-run search corpus.
  • The package demonstrates protein-rollup drift directly, but it still does not prove live-engine calibration parity.

Fixture realism limits:

  • The public package is still a one-run imported-result surface rather than a broader cohort-grade DDA benchmark.
  • The package demonstrates cross-engine drift but does not yet replace live-engine rerun proof.

Expected failure conditions:

  • Adapter normalization drops or misreads decoy labels.
  • Reviewed-proteome accessions drift during import or rollup.

Non-transfer zones:

  • Unrepresented proteases or mixed-protease exports.
  • Raw-spectrum scoring parity and engine-side calibration behavior.

Obsolescence conditions:

  • Search-engine export columns change in a way that the checked fixture no longer reflects current outputs.
  • Reference-proteome mapping rules change without a corresponding fixture refresh.

dda: companion generalization package

  • package id: public_companion_package:dda_cross_engine_review_package
  • package root: packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/dda_cross_engine_review_package

Quality blockers:

  • no live-engine rerun parity
  • generalization remains bounded to two small exported-result packages

Weakness notes:

  • The tracked DDA package is still smaller and cleaner than a production multi-run search corpus.
  • The package demonstrates protein-rollup drift directly, but it still does not prove live-engine calibration parity.

Fixture realism limits:

  • The public package is still a one-run imported-result surface rather than a broader cohort-grade DDA benchmark.
  • The package demonstrates cross-engine drift but does not yet replace live-engine rerun proof.

Expected failure conditions:

  • Adapter normalization drops or misreads decoy labels.
  • Reviewed-proteome accessions drift during import or rollup.

Non-transfer zones:

  • Unrepresented proteases or mixed-protease exports.
  • Raw-spectrum scoring parity and engine-side calibration behavior.

Obsolescence conditions:

  • Search-engine export columns change in a way that the checked fixture no longer reflects current outputs.
  • Reference-proteome mapping rules change without a corresponding fixture refresh.

dia: primary flagship package

  • package id: flagship_public_package:dia_library_review_package
  • package root: packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/dia_library_review_package

Quality blockers:

  • no chromatogram-level vendor parity
  • library incompleteness and absent-peptide consequences still block broader biological confidence

Weakness notes:

  • The public package still does not capture the full instability of production library curation and chromatographic drift.
  • Library-conditioned extraction can look more complete than the underlying protein-level support actually is.

Fixture realism limits:

  • The checked-in DIA export does not pressure vendor-library churn, chromatography drift, or peptide absence ambiguity at production scale.
  • The fixture is library-conditioned and cannot authorize open-ended protein-level absence claims.

Expected failure conditions:

  • Transition semantics drift while column names still normalize cleanly.
  • Library scope is dropped from the final review surface.

Non-transfer zones:

  • Unseen library compositions, vendor-tuned extraction heuristics, and chromatographic drift outside the fixture.
  • Protein-level absence claims inferred from library-conditioned missing peptides.

Obsolescence conditions:

  • Supported DIA export dialects change without fixture refresh.
  • Controlled-vocabulary mappings or library assumptions change materially.

dia: companion generalization package

  • package id: public_companion_package:dia_matrix_shift_review_package
  • package root: packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/dia_matrix_shift_review_package

Quality blockers:

  • protein-evidence transfer remains weaker than precursor-level review transfer
  • library-conditioned authority still caps the family posture

Weakness notes:

  • The public package still does not capture the full instability of production library curation and chromatographic drift.
  • Library-conditioned extraction can look more complete than the underlying protein-level support actually is.

Fixture realism limits:

  • The checked-in DIA export does not pressure vendor-library churn, chromatography drift, or peptide absence ambiguity at production scale.
  • The fixture is library-conditioned and cannot authorize open-ended protein-level absence claims.

Expected failure conditions:

  • Transition semantics drift while column names still normalize cleanly.
  • Library scope is dropped from the final review surface.

Non-transfer zones:

  • Unseen library compositions, vendor-tuned extraction heuristics, and chromatographic drift outside the fixture.
  • Protein-level absence claims inferred from library-conditioned missing peptides.

Obsolescence conditions:

  • Supported DIA export dialects change without fixture refresh.
  • Controlled-vocabulary mappings or library assumptions change materially.

lfq: primary flagship package

  • package id: flagship_public_package:lfq_cohort_review_package
  • package root: packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/lfq_cohort_review_package

Quality blockers:

  • no stronger public truth package for accuracy beyond repeatability
  • generalization beyond the current cohort package remains explicitly bounded

Weakness notes:

  • The public package still underrepresents the sample heterogeneity and dropout patterns seen in broader production cohorts.
  • Protein-level repeatability can obscure peptide-level ambiguity and design-sensitive missingness.

Fixture realism limits:

  • The LFQ fixture does not represent broader cohort heterogeneity or severe missing-not-at-random behavior.
  • Repeatability under this study shape does not authorize decision-grade abundance claims by itself.

Expected failure conditions:

  • Protein rollups remain numerically stable while missingness or contrast semantics drift.
  • Design annotations survive import but no longer match the benchmarked comparison.

Non-transfer zones:

  • Large heterogeneous cohorts with stronger missing-not-at-random behavior.
  • Accuracy claims against external LFQ pipelines or spike-in truth sets.

Obsolescence conditions:

  • LFQ design fixtures change in sample structure without metadata refresh.
  • Quantification claims expand beyond repeatability into accuracy without new truth evidence.

lfq: companion generalization package

  • package id: public_companion_package:lfq_sparse_contrast_review_package
  • package root: packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/lfq_sparse_contrast_review_package

Quality blockers:

  • effect-direction confidence weakens under sparser contrast
  • family authority remains bounded rather than decision-grade

Weakness notes:

  • The public package still underrepresents the sample heterogeneity and dropout patterns seen in broader production cohorts.
  • Protein-level repeatability can obscure peptide-level ambiguity and design-sensitive missingness.

Fixture realism limits:

  • The LFQ fixture does not represent broader cohort heterogeneity or severe missing-not-at-random behavior.
  • Repeatability under this study shape does not authorize decision-grade abundance claims by itself.

Expected failure conditions:

  • Protein rollups remain numerically stable while missingness or contrast semantics drift.
  • Design annotations survive import but no longer match the benchmarked comparison.

Non-transfer zones:

  • Large heterogeneous cohorts with stronger missing-not-at-random behavior.
  • Accuracy claims against external LFQ pipelines or spike-in truth sets.

Obsolescence conditions:

  • LFQ design fixtures change in sample structure without metadata refresh.
  • Quantification claims expand beyond repeatability into accuracy without new truth evidence.

multiplex: primary flagship package

  • package id: flagship_public_package:multiplex_tmtpro_review_package
  • package root: packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/multiplex_tmtpro_review_package

Quality blockers:

  • no multiplex lab packet or outsider decision brief family
  • multiplex authority is intentionally kept out of the outsider-facing flagship set

Weakness notes:

  • The public package surface is still narrower than production multiplex cohorts with more severe missing-channel and interference behavior.
  • Channel stability can look stronger than the underlying protein-level certainty actually is.

Fixture realism limits:

  • The multiplex fixture does not exercise the strongest carrier overload, interference, or unbalanced cohort behavior seen in production.
  • Reporter stability under this fixture does not authorize label-free-style decision claims.

Expected failure conditions:

  • Reporter-channel assignments drift or collapse during quantification rollup.
  • Channel-level caveats disappear from the final interpretation surface.

Non-transfer zones:

  • Severe interference, carrier overload, and vendor-specific multiplex tuning outside the bundled fixture.
  • Claims that reporter summaries are interchangeable with label-free abundance truth.

Obsolescence conditions:

  • Multiplex channel mappings or fixture design change without metadata refresh.
  • Supported multiplex chemistry families change without benchmark scope review.

multiplex: companion generalization package

  • package id: public_companion_package:multiplex_channel_stress_review_package
  • package root: packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/multiplex_channel_stress_review_package

Quality blockers:

  • multiplex still lacks outsider review and lab consequence posture
  • public release language remains internal-support only even with a second package

Weakness notes:

  • The public package surface is still narrower than production multiplex cohorts with more severe missing-channel and interference behavior.
  • Channel stability can look stronger than the underlying protein-level certainty actually is.

Fixture realism limits:

  • The multiplex fixture does not exercise the strongest carrier overload, interference, or unbalanced cohort behavior seen in production.
  • Reporter stability under this fixture does not authorize label-free-style decision claims.

Expected failure conditions:

  • Reporter-channel assignments drift or collapse during quantification rollup.
  • Channel-level caveats disappear from the final interpretation surface.

Non-transfer zones:

  • Severe interference, carrier overload, and vendor-specific multiplex tuning outside the bundled fixture.
  • Claims that reporter summaries are interchangeable with label-free abundance truth.

Obsolescence conditions:

  • Multiplex channel mappings or fixture design change without metadata refresh.
  • Supported multiplex chemistry families change without benchmark scope review.

ptm: primary flagship package

  • package id: flagship_public_package:ptm_localization_review_package
  • package root: packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/ptm_localization_review_package

Quality blockers:

  • occupancy and regulatory interpretation still remain narrower than localization evidence
  • PTM follow-up remains exploratory and bounded by ambiguity-aware consequence planning

Weakness notes:

  • Localization confidence can still hide uncertainty about biological relevance and occupancy magnitude.
  • The public package is still phosphorylation-oriented and does not generalize to every PTM family equally well under broader production PTM diversity.

Fixture realism limits:

  • The fixture emphasizes phosphorylation localization and does not represent full PTM family diversity.
  • The dataset is too tidy to authorize occupancy or broad regulatory storytelling on its own.

Expected failure conditions:

  • Localized and ambiguous site groups are collapsed into one accepted site claim.
  • PTM concept identifiers resolve while localization confidence is discarded.

Non-transfer zones:

  • Stoichiometric occupancy and broad regulatory claims.
  • PTM families that are not represented by the phosphorylation-oriented fixture.

Obsolescence conditions:

  • PTM localization conventions change without a fixture refresh.
  • Supported PTM families broaden or narrow without updating the benchmark scope.

ptm: companion generalization package

  • package id: public_companion_package:ptm_ambiguity_stress_review_package
  • package root: packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/ptm_ambiguity_stress_review_package

Quality blockers:

  • targetability weakens materially under ambiguity stress
  • family authority remains bounded rather than decision-grade

Weakness notes:

  • Localization confidence can still hide uncertainty about biological relevance and occupancy magnitude.
  • The public package is still phosphorylation-oriented and does not generalize to every PTM family equally well under broader production PTM diversity.

Fixture realism limits:

  • The fixture emphasizes phosphorylation localization and does not represent full PTM family diversity.
  • The dataset is too tidy to authorize occupancy or broad regulatory storytelling on its own.

Expected failure conditions:

  • Localized and ambiguous site groups are collapsed into one accepted site claim.
  • PTM concept identifiers resolve while localization confidence is discarded.

Non-transfer zones:

  • Stoichiometric occupancy and broad regulatory claims.
  • PTM families that are not represented by the phosphorylation-oriented fixture.

Obsolescence conditions:

  • PTM localization conventions change without a fixture refresh.
  • Supported PTM families broaden or narrow without updating the benchmark scope.

targeted: primary flagship package

  • package id: flagship_public_package:targeted_transition_review_package
  • package root: packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/targeted_transition_review_package

Quality blockers:

  • vendor-parity and calibration-clean authority are still outside the current proof boundary
  • targeted follow-up remains exploratory and cannot authorize calibration-perfect biological certainty

Weakness notes:

  • The public package evidence is still operationally tidy compared with noisier targeted production runs and carryover scenarios.
  • Transition retention is easier to prove than protein-specific interpretability in shared-peptide settings.

Fixture realism limits:

  • The targeted fixture does not cover vendor-specific chromatogram quirks, calibration standards, or messy carryover behavior.
  • Transition retention under this fixture does not authorize direct protein certainty claims.

Expected failure conditions:

  • Transition QC stays numerically stable while rollup removes protein-inference caution.
  • Chromatogram warnings are flattened into a clean targeted-support claim.

Non-transfer zones:

  • Vendor-specific chromatogram behavior, calibration standards, and transition-interference edge cases outside the bundled fixture.
  • Claims that targeted QC alone resolves shared-peptide ambiguity or confirms protein truth.

Obsolescence conditions:

  • Targeted fixture schema changes without updated transition-level metadata.
  • Targeted support claims expand into vendor or calibration parity without new benchmark evidence.

targeted: companion generalization package

  • package id: public_companion_package:targeted_carryover_review_package
  • package root: packages/bijux-proteomics-core/benchmark-assets/flagship-public-packages/targeted_carryover_review_package

Quality blockers:

  • stronger carryover pressure weakens promotion confidence
  • family authority remains bounded by calibration and vendor-parity limits

Weakness notes:

  • The public package evidence is still operationally tidy compared with noisier targeted production runs and carryover scenarios.
  • Transition retention is easier to prove than protein-specific interpretability in shared-peptide settings.

Fixture realism limits:

  • The targeted fixture does not cover vendor-specific chromatogram quirks, calibration standards, or messy carryover behavior.
  • Transition retention under this fixture does not authorize direct protein certainty claims.

Expected failure conditions:

  • Transition QC stays numerically stable while rollup removes protein-inference caution.
  • Chromatogram warnings are flattened into a clean targeted-support claim.

Non-transfer zones:

  • Vendor-specific chromatogram behavior, calibration standards, and transition-interference edge cases outside the bundled fixture.
  • Claims that targeted QC alone resolves shared-peptide ambiguity or confirms protein truth.

Obsolescence conditions:

  • Targeted fixture schema changes without updated transition-level metadata.
  • Targeted support claims expand into vendor or calibration parity without new benchmark evidence.