Skip to content

CLI Surface

CLI documentation should describe the commands the package truly owns, not the commands a reader might wish existed.

Package Surface

  • src/bijux_proteomics/interfaces/cli/app.py and interfaces/cli/__main__.py are the command-line surfaces for core contract workflows
  • CLI behavior should reveal contract meaning and validation state rather than runtime orchestration detail
  • new CLI promises must stay aligned with the stable contract model

First Proof Check

  • src/bijux_proteomics/domain/program_spec.py, domain/repositories.py, and domain/targets.py
  • src/bijux_proteomics/interfaces/cli/app.py and interfaces/cli/__main__.py
  • packages/bijux-proteomics-core/tests

FASTA Commands

The owned FASTA CLI surface is:

  • fasta-parse
  • fasta-contaminants
  • fasta-profile
  • fasta-stats
  • fasta-dedup
  • fasta-filter
  • fasta-provenance
  • fasta-decoy
  • target-decoy-validate
  • peptide-index
  • fragment-ions
  • peptide-properties
  • precursor-mass-error
  • modified-peptide-parse
  • modification-resolve
  • psm-map
  • peptide-evidence
  • fdr-reference-check
  • fdr-levels
  • picked-protein-fdr
  • protein-ambiguity
  • protein-inference-benchmarks
  • peptide-matrix
  • protein-matrix
  • protein-lfq
  • protein-groups
  • protein-coverage
  • protein-coverage-plot
  • protein-parsimony
  • maxquant-import
  • diann-import
  • spectronaut-import
  • openms-import
  • comet-import
  • fragpipe-import
  • sage-import
  • summarize --kind fasta

The owned spectrum CLI surface is:

  • spectrum-parse
  • spectrum-stats
  • spectrum-annotate
  • spectrum-similarity
  • spectral-library-import
  • spectral-library-search
  • spectrum-summary
  • mzml-inspect
  • summarize --kind mgf

fasta-parse emits the full parser report, including:

  • accepted and rejected records
  • duplicate identifiers
  • duplicate normalized accessions
  • parser-level database composition over accepted records

The database composition surface reports:

  • accepted record count
  • target count
  • decoy count
  • contaminant count
  • accession-namespace counts

fasta-profile emits a richer database-review object with:

  • a summary block covering input records, accepted proteins, rejected records, unique accessions, target count, decoy count, contaminant count, total residues, length extremes, and organism annotation coverage
  • a stable length-distribution ledger across the bins 1-99, 100-249, 250-499, 500-999, and 1000+
  • an organism-distribution ledger when organism evidence is present in the accepted records

fasta-profile also supports reviewer-facing TSV exports through:

  • --summary-tsv-out
  • --length-tsv-out
  • --organism-tsv-out

fasta-contaminants builds a more realistic search database by:

  • appending the owned built-in contaminant panel unless --no-include-builtin is selected
  • appending one or more user-provided contaminant FASTA files through repeated --contaminant-fasta
  • relabeling appended contaminant proteins with the stable CON__ prefix
  • writing a build report with separate built-in and external append counts plus skipped duplicate contaminant accessions

fasta-decoy builds a target-decoy database and reports both accession-level and sequence-level review signals.

  • --decoy-mode reverse and --decoy-mode shuffle select the owned decoy construction method.
  • --prefix preserves target protein identity inside the decoy accession while enforcing collision-free accession generation.
  • Mixed target-plus-decoy inputs are rejected instead of being re-expanded.
  • Prefix choices that would collide with existing target accessions fail before output is written.

The fasta-decoy JSON payload includes:

  • mode
  • prefix
  • seed
  • output_fasta
  • target_count
  • decoy_count
  • report
  • generation_report
  • output_sha256
  • reproducibility_hash

generation_report adds reviewer-facing target-decoy construction details:

  • input target count
  • generated decoy count
  • unchanged sequence count and accession list
  • target-sequence collision count and accession list
  • validity flag for the generated decoy surface

target-decoy-validate checks a finished database after generation and reports:

  • target and decoy counts
  • prefix and mode compatibility
  • duplicate accession and duplicate sequence burden
  • target-versus-decoy sequence overlap signals
  • overall validity of the target-decoy database

peptide-index digests a FASTA database and reports how one or more peptide queries map back to proteins under the selected digestion assumptions.

  • --peptide is repeatable and accepts plain or modified peptide notation.
  • --protease, --missed-cleavages, and --digestion-mode define the digest policy used to build the searchable peptide space.
  • --il-equivalent optionally collapses isoleucine and leucine during lookup.
  • --protein-group-map accepts a TSV with accession and protein_group columns so group-specific peptides stay explicit.

The peptide-index JSON payload includes:

  • input record count
  • query peptide count
  • protease
  • digestion mode
  • missed cleavages
  • I/L-equivalence flag
  • protein-group-map presence flag
  • one report object with per-peptide lookup entries and summary counts

Each lookup entry reports:

  • the original query peptide
  • the canonical residue sequence used for lookup
  • the final lookup sequence after optional I/L normalization
  • whether modification stripping or I/L-equivalent lookup was applied
  • matched protein accessions, families, and groups
  • protein-group count
  • uniqueness and audit class when the peptide is present
  • target, decoy, contaminant, mixed, or missing database membership
  • missed-cleavage counts observed among the matching peptide instances
  • a reviewer-facing explanation string

peptide-properties reports one peptide-level screening object for filtering or review before search and downstream analysis.

  • --mod accepts repeatable modification assignments in the same style as peptide-mass.
  • --charge chooses the precursor charge state used for m/z calculation.
  • --protease, --custom-protease, and --custom-protease-name define the missed-cleavage context.
  • --registry optionally loads a modification registry for named modifications.

The peptide-properties JSON payload includes:

  • canonical notation
  • underlying residue sequence
  • protease
  • charge
  • residue length
  • monoisotopic mass
  • average mass
  • monoisotopic precursor m/z for the selected charge state
  • missed-cleavage count
  • hydrophobicity proxy
  • problem flags and a final problematic or not flag

fragment-ions emits a dedicated theoretical fragment-ion review report for one peptide or modified peptide.

  • --mod accepts repeatable modification assignments in the same style as peptide-mass.
  • --charge is repeatable and defaults to both 1 and 2.
  • --fragment-series accepts the owned series labels and defaults to b plus y.
  • --include-neutral-losses adds supported residue and modification losses.
  • --tsv-out writes one row per theoretical fragment ion.

The fragment-ions JSON payload includes:

  • canonical notation and residue sequence
  • selected charge states and fragment series
  • whether neutral losses were included
  • total fragment-ion count
  • counts by series
  • counts by charge
  • neutral-loss ion count
  • the full fragment-ion rows with series, ordinal, charge, neutral loss, and monoisotopic or average mass and m/z values

precursor-mass-error emits a reviewer-facing precursor calibration report from one TSV table of peptide, observed-m/z, and charge observations.

  • --peptide-column, --observed-mz-column, --charge-column, and --spectrum-id-column map the input table.
  • --max-isotope-offset controls how many isotope-offset candidates are ranked.
  • --summary-tsv-out writes the one-row report summary.
  • --observations-tsv-out writes one row per peptide observation.
  • --ppm-distribution-tsv-out writes the absolute-ppm distribution.
  • --charge-distribution-tsv-out writes the charge-state distribution.
  • --isotope-distribution-tsv-out writes the recommended isotope-offset distribution.

The precursor-mass-error JSON payload includes:

  • input row count and accepted observation count
  • mean and median ppm error plus mean Da error
  • median and maximum absolute ppm error
  • charge, absolute-ppm, and recommended isotope-offset distributions
  • per-observation peptide, canonical peptide, charge, theoretical m/z, observed m/z, Da error, ppm error, and isotope advisory
  • any requested TSV export paths

modified-peptide-parse normalizes one engine-specific modified peptide string into the owned canonical modified peptide contract.

  • --dialect is required and accepts maxquant, msfragger, fragpipe, sage, or comet.
  • --registry optionally loads a modification registry for named modifications.
  • --out writes the normalization report as JSON.

The modified-peptide-parse JSON payload includes:

  • the named engine dialect
  • the original notation
  • the stripped residue sequence
  • the canonical modified peptide notation
  • explicit protein-terminal context flags
  • the full normalized modification rows with preserved site positions

fragpipe-import emits one governed review packet over a FragPipe psm.tsv, peptide table, and protein table bundle.

  • the positional argument is the FragPipe psm.tsv path
  • --peptide-tsv is required and supplies the peptide-level table
  • --protein-tsv is required and supplies the protein-level table
  • --summary-tsv-out writes the one-row bundle summary
  • --psm-tsv-out writes reviewer-facing PSM rows with modification and mass-difference evidence
  • --peptide-review-tsv-out writes the peptide-level review table
  • --protein-review-tsv-out writes the protein-level review table

The fragpipe-import JSON payload includes:

  • one compact bundle summary over accepted PSMs, peptide rows, protein rows, q-value coverage, modified rows, open-search-like rows, and mapped proteins
  • the PSM normalization adapter identity plus accepted and rejected PSM counts
  • reviewer-facing PSM rows with hyperscore, q-value, target-decoy state, protein references, modification evidence, and mass-difference state
  • reviewer-facing peptide rows with mapped proteins, probability, q-value, spectral count, modification evidence, and mass-difference state
  • reviewer-facing protein rows with identity, annotation, coverage, peptide burden, spectral count, probability, and target-decoy state
  • any requested TSV export paths

sage-import emits one governed review packet over a realistic Sage PSM export.

  • the positional argument is the Sage result TSV path
  • --config optionally loads a Sage search configuration JSON file
  • --summary-tsv-out writes the one-row import summary
  • --psm-tsv-out writes reviewer-facing Sage PSM rows

The sage-import JSON payload includes:

  • the detected Sage dialect identifier
  • one compact summary over accepted or rejected rows, modified PSMs, hyperscore coverage, q-value coverage, multi-protein rows, and target or decoy burden
  • the normalization adapter identity plus accepted and rejected row counts
  • an optional parsed Sage parameter report when --config is supplied
  • reviewer-facing PSM rows with discriminant score, hyperscore, q-values, posterior error, protein mappings, modification burden, matched-peak shape, and mass-accuracy fields
  • any requested TSV export paths

comet-import emits one governed review packet over practical Comet tabular or pepXML result evidence.

  • the positional argument is the Comet result file path
  • --config optionally loads a Comet parameter file
  • --summary-tsv-out writes the one-row import summary
  • --psm-tsv-out writes reviewer-facing Comet PSM rows

The comet-import JSON payload includes:

  • the detected import kind as tabular or pepxml
  • one compact summary over accepted or rejected rows, modified PSMs, XCorr coverage, DeltaCn coverage, expectation-value coverage, multi-protein rows, and target or decoy burden
  • the normalization adapter identity plus accepted and rejected row counts for tabular imports
  • an optional parsed Comet parameter report when --config is supplied
  • reviewer-facing PSM rows with modified peptide notation, residue sequence, canonical peptide, charge, expectation value, XCorr, DeltaCn, Sp score, protein mappings, and target-decoy label
  • any requested TSV export paths

maxquant-import emits one governed review packet over a MaxQuant evidence.txt, peptides.txt, and proteinGroups.txt bundle.

  • the positional argument is the MaxQuant evidence.txt path
  • --peptides-txt is required and supplies the peptides.txt table
  • --protein-groups-txt is required and supplies the proteinGroups.txt table
  • --config optionally loads a MaxQuant settings file
  • --summary-tsv-out writes the one-row import summary
  • --evidence-tsv-out writes reviewer-facing evidence rows
  • --peptide-tsv-out writes reviewer-facing peptide rows
  • --protein-group-tsv-out writes reviewer-facing protein-group rows

The maxquant-import JSON payload includes:

  • one compact summary over accepted and rejected evidence rows, peptide and protein-group row counts, modified evidence burden, experiment names, LFQ experiment names, and contaminant or reverse counts across the bundle
  • the evidence normalization adapter identity plus accepted and rejected row counts for the native MaxQuant evidence surface
  • an optional parsed MaxQuant parameter report when --config is supplied
  • reviewer-facing evidence rows with experiment name, modified peptide notation, residue sequence, canonical peptide, charge, score, posterior error probability, protein mappings, and contaminant or reverse flags
  • reviewer-facing peptide rows with modified sequence, leading razor protein, protein mappings, score, posterior error probability, intensity, MS/MS count, and contaminant or reverse flags
  • reviewer-facing protein-group rows with protein identities, peptide burden, sequence coverage, only-identified-by-site state, per-experiment LFQ intensities, and contaminant or reverse flags
  • any requested TSV export paths

diann-import emits one governed review packet over a DIA-NN precursor report.

  • the positional argument is the DIA-NN report TSV path
  • --config optionally loads a DIA-NN configuration JSON file
  • --summary-tsv-out writes the one-row import summary
  • --precursor-tsv-out writes reviewer-facing precursor rows
  • --protein-group-tsv-out writes reviewer-facing protein-group rows

The diann-import JSON payload includes:

  • one compact summary over accepted and rejected precursor rows, protein-group row count, run names, sample names, Precursor.Quantity coverage, PG.Quantity coverage, and target or decoy precursor burden
  • the DIA-NN normalization adapter identity plus accepted and rejected row counts for the source report
  • an optional parsed DIA-NN parameter report when --config is supplied
  • reviewer-facing precursor rows with precursor identifier, peptide sequence, canonical peptide, charge, q-value, Protein.Group, Protein.Ids, run, sample, Precursor.Quantity, PG.Quantity, and target-decoy label
  • reviewer-facing protein-group rows with protein-group identity, protein references, run, sample, q-value, PG.Quantity, source precursor count, and target-decoy label
  • the derived DIA-native precursor and protein-group quantity import report
  • any requested TSV export paths

spectronaut-import emits one governed review packet over a Spectronaut precursor export table.

  • the positional argument is the Spectronaut report TSV path
  • --config optionally loads a Spectronaut settings file
  • --summary-tsv-out writes the one-row import summary
  • --precursor-tsv-out writes reviewer-facing precursor rows
  • --protein-group-tsv-out writes reviewer-facing protein-group rows

The spectronaut-import JSON payload includes:

  • one compact summary over accepted and rejected precursor rows, protein-group row count, modified precursor count, sample names, run names, FG.Quantity coverage, PG.Quantity coverage, and target or decoy precursor burden
  • the Spectronaut normalization adapter identity plus accepted and rejected row counts for the source report
  • an optional parsed Spectronaut parameter report when --config is supplied
  • reviewer-facing precursor rows with precursor identifier, stripped peptide, modified peptide, canonical modified peptide, charge, confidence score, q-value, protein group, protein accessions, run, sample, FG.Quantity, PG.Quantity, and target-decoy label
  • reviewer-facing protein-group rows with protein-group identity, protein references, run, sample, q-value, PG.Quantity, source precursor count, and target-decoy label
  • any requested TSV export paths

psm-map emits one governed generic-mapping report for a lab-local PSM table.

  • the positional argument is the source PSM TSV path
  • --mapping is required and accepts one YAML or JSON column-map file
  • --normalized-tsv-out writes the normalized mapped PSM table

psm-inspect also accepts explicit canonical-schema column controls when a lab-local TSV needs direct inspection without a separate mapping document.

  • --run-id-column maps one run-identity column when the source export carries repeated scan identifiers across runs
  • --modified-peptide-column maps one source column that carries modified peptide notation separate from the stripped peptide column
  • --contaminant-label-column maps one explicit contaminant-state column when contaminant status is supplied directly instead of inferred only from protein references
  • --protease defines the missed-cleavage policy used in the inspection report
  • --summary-tsv-out, --score-distribution-tsv-out, --q-value-distribution-tsv-out, --charge-distribution-tsv-out, --peptide-length-distribution-tsv-out, and --missed-cleavage-distribution-tsv-out write reviewer-facing inspection ledgers

The psm-map JSON payload includes:

  • the validated column map used for normalization
  • the observed source columns from the input table
  • one compact summary over total, accepted, and rejected rows, mapped run coverage, q-value coverage, protein-reference coverage, and unmapped source columns
  • rejected rows with stable issue details
  • mapped rows with run identity, spectrum identity, residue-only peptide sequence, peptide text, canonical modified peptide when present, charge, score, q-value, protein references, target-decoy label, and contaminant flag
  • any requested normalized TSV output path

psm-inspect emits one direct quality-inspection packet over the parsed PSM table.

  • accepted and rejected row counts remain explicit at the top level
  • inspection adds total, accepted, and rejected row counts together with the named protease used for missed-cleavage review
  • inspection.score_distribution reports accepted PSM counts across stable score bins
  • inspection.q_value_distribution reports accepted PSM counts across stable q-value buckets plus missing-q-value rows when present
  • inspection.charge_distribution reports accepted PSM counts by charge state
  • inspection.peptide_length_distribution reports accepted PSM counts across stable peptide-length buckets
  • inspection.missed_cleavage_distribution reports accepted PSM counts by missed-cleavage burden under the selected protease
  • any requested summary or distribution TSV outputs are reported under outputs

peptide-evidence emits one direct peptide-evidence review packet over the parsed PSM table after peptide-level rollup and peptide-level FDR review.

  • the positional argument is the source PSM TSV path
  • --threshold defaults to 0.05 and defines the peptide-level FDR threshold
  • --strong-q-value defaults to 0.01 and defines the stricter threshold for the strong primary class
  • canonical-schema and decoy-policy column controls stay available so lab-local PSM tables can be reviewed without a separate conversion pass
  • --summary-tsv-out writes one compact peptide-evidence summary ledger
  • --entries-tsv-out writes one per-peptide evidence review ledger

The peptide-evidence JSON payload includes:

  • the selected threshold, score orientation, and strong-evidence q-value
  • accepted and rejected source-row counts from the parsed PSM table
  • one summary block with total, accepted, rejected, strong, weak, unique, shared, modified, contaminant, and decoy peptide counts
  • one peptide review row per canonical peptide with primary class, orthogonal tags, peptide-level q-value, acceptance state, counts, protein references, target-decoy label, contaminant flag, and explanation
  • any requested summary or entry TSV output paths

fdr-reference-check validates curated target-decoy reference cases against the owned FDR implementation.

  • the positional argument is one JSON file containing a list of curated reference cases
  • --summary-tsv-out writes one case-level validation summary table
  • --entries-tsv-out writes one ranked entry-level validation table

The fdr-reference-check JSON payload includes:

  • overall validity across all curated cases
  • case and entry counts plus total failed-entry count
  • one case report per curated reference with score orientation, tie handling, optional threshold, reproducibility hash, and q-value monotonicity status
  • one ranked validation row per expected entry with expected-versus-observed cumulative counts, FDR, q-value, acceptance state, and explicit mismatch fields
  • any requested summary or entry TSV output paths

fdr-levels compares accepted PSM, peptide, and protein evidence across explicit FDR thresholds.

  • the positional argument is the source PSM TSV path
  • --threshold can be repeated and defaults to 0.01, 0.05, and 0.1
  • --summary-tsv-out writes one threshold-by-level summary table
  • --entries-tsv-out writes one accepted-entity ledger across thresholds

The fdr-levels JSON payload includes:

  • the selected score orientation and ordered thresholds
  • accepted and rejected source-row counts from the parsed PSM table
  • one threshold summary per evidence level with total and accepted counts for target, decoy, mixed, unknown, and contaminant burden
  • one accepted-entry row per threshold and evidence level with entity identity, q-value, rank, member count, target-decoy label, contaminant flag, and protein references
  • any requested summary or entry TSV output paths

picked-protein-fdr compares target-versus-decoy protein competition across explicit picked-protein thresholds.

  • the positional argument is the source PSM TSV path
  • --threshold can be repeated and defaults to 0.01, 0.05, and 0.1
  • canonical-schema and decoy-policy column controls stay available so lab-local PSM tables can be reviewed without a separate conversion pass
  • --summary-tsv-out writes one threshold-level competition summary table
  • --entries-tsv-out writes one picked-protein ledger across thresholds

The picked-protein-fdr JSON payload includes:

  • the selected score orientation and ordered thresholds
  • accepted and rejected source-row counts from the parsed PSM table
  • one threshold summary with total and accepted target, decoy, contaminant, and grouped-protein burden
  • one picked-protein review row per threshold with protein identity, target-versus-decoy partner identity, protein-group identifiers, score, q-value, FDR, rank, acceptance state, contaminant flag, and supporting peptides
  • any requested summary or entry TSV output paths

protein-groups emits one direct protein-grouping review packet over FDR-filtered PSM evidence.

  • the positional argument is the source PSM TSV path
  • --threshold defaults to 0.05 and defines the accepted PSM evidence that feeds grouping
  • canonical-schema and decoy-policy column controls stay available so lab-local PSM tables can be grouped without a separate conversion pass
  • --summary-tsv-out writes one compact protein-grouping summary ledger
  • --group-tsv-out writes the reviewer-facing protein group table

The protein-groups JSON payload includes:

  • the selected threshold and score orientation
  • accepted and rejected source-row counts from the parsed PSM table
  • grouped-row count after FDR filtering
  • one summary block with total groups, singleton groups, ambiguous groups, grouped proteins, target/decoy/mixed/unknown burden, and contaminant burden
  • one protein-group row per group with representative protein, leading protein, leading rationale, protein members, all peptides, unique peptides, shared peptides, score, q-value, target-decoy label, and contaminant flag
  • any requested summary or group-table TSV output paths

protein-ambiguity emits one direct protein-ambiguity review packet over FDR-filtered PSM evidence.

  • the positional argument is the source PSM TSV path
  • --threshold defaults to 0.05 and defines the accepted PSM evidence that feeds ambiguity review
  • --high-q-value and --medium-q-value define the confidence bands applied to each ambiguous group
  • canonical-schema and decoy-policy column controls stay available so lab-local PSM tables can be reviewed without a separate conversion pass
  • --summary-tsv-out writes one compact protein-ambiguity summary ledger
  • --ambiguity-tsv-out writes one reviewer-facing ambiguity table

The protein-ambiguity JSON payload includes:

  • the selected threshold, score orientation, and confidence-band cutoffs
  • accepted and rejected source-row counts from the parsed PSM table
  • grouped-row count after FDR filtering and ambiguity-row count after the review surface isolates unresolved groups
  • one summary block with ambiguous-group count, ambiguous-protein count, indistinguishable versus external-shared versus mixed-group burden, and confidence-label counts
  • one ambiguity row per unresolved protein group with representative protein, all protein members, indistinguishable-member ledger, shared peptides, unique peptides, outside-group proteins, ambiguity reason, ambiguity explanation, score, q-value, confidence label, target-decoy label, and contaminant flag
  • any requested summary or ambiguity-table TSV output paths

protein-inference-benchmarks emits one owned benchmark review packet over the named protein-inference pressure catalog.

  • there is no positional input; the command runs the repository-owned benchmark scenarios directly
  • --picked-threshold defaults to 0.05 and defines the protein threshold used by the picked strategy inside the benchmark suite
  • --summary-tsv-out writes one compact suite-summary ledger
  • --scenarios-tsv-out writes one benchmark-scenario ledger
  • --assessments-tsv-out writes one strategy-assessment ledger across every scenario

The protein-inference-benchmarks JSON payload includes:

  • the selected picked threshold
  • scenario count together with explicit shared-peptide, isoform, homolog-family, contaminant, and decoy case counts
  • worst strategy precision and recall lower bounds across the suite
  • covered inference-strategy kinds
  • one benchmark report per named scenario with expected-present and expected-absent proteins, pressure flags, disagreement count, and one strategy assessment per inference method
  • any requested summary, scenario, or assessment TSV output paths

peptide-matrix emits one owned peptide-by-sample intensity matrix over either precursor or feature evidence or intensity-bearing PSM evidence.

  • --input-kind accepts feature or psm
  • --grouping-mode accepts peptide_sequence or modified_peptide
  • --separate-charge-states keeps precursor charge states split into separate peptide rows
  • --aggregation accepts the owned sum, median, or top-n policies
  • --summary-tsv-out writes one compact matrix-summary ledger
  • --matrix-tsv-out writes one wide peptide-by-sample abundance matrix
  • --missingness-tsv-out writes one per-sample missingness ledger

The peptide-matrix JSON payload includes:

  • input kind
  • accepted and rejected source-record counts from the selected parser
  • one peptide-matrix report with grouping mode, charge policy, aggregation method, sample identifiers, peptide rows, and missingness summary
  • report-level counts for accepted and skipped source rows after matrix construction
  • any requested summary, matrix, or missingness TSV output paths

protein-matrix emits one owned protein-by-sample intensity matrix over either precursor or feature evidence or intensity-bearing PSM evidence.

  • --input-kind accepts feature or psm
  • --grouping-mode accepts peptide_sequence or modified_peptide
  • --target-kind accepts protein or protein_group
  • --aggregation accepts the owned sum, median, or top-n policies
  • --unique-peptide-only excludes shared-peptide rows before protein rollup
  • --summary-tsv-out writes one compact protein-matrix summary ledger
  • --matrix-tsv-out writes one wide protein-by-sample abundance matrix
  • --missingness-tsv-out writes one per-sample missingness ledger

The protein-matrix JSON payload includes:

  • input kind
  • accepted and rejected source-record counts from the selected parser
  • one protein-matrix report with target kind, rollup policy, peptide counts, unique/shared peptide burden, sample identifiers, and missingness summary
  • any requested summary, matrix, or missingness TSV output paths

protein-lfq emits one owned MaxLFQ-like protein abundance matrix over either precursor or feature evidence or intensity-bearing PSM evidence.

  • --input-kind accepts feature or psm
  • --grouping-mode accepts peptide_sequence or modified_peptide
  • --target-kind accepts protein or protein_group
  • --aggregation accepts the owned sum, median, or top-n peptide-collapse policies used before pairwise ratio construction
  • --unique-peptide-only excludes shared-peptide rows before LFQ solving
  • --minimum-shared-peptides requires a minimum number of shared peptides before one sample-pair ratio is retained
  • --summary-tsv-out writes one compact protein-LFQ summary ledger
  • --matrix-tsv-out writes one wide protein-by-sample LFQ abundance matrix
  • --pairwise-tsv-out writes one reviewer-facing pairwise-ratio ledger
  • --missingness-tsv-out writes one per-sample missingness ledger

The protein-lfq JSON payload includes:

  • input kind
  • accepted and rejected source-record counts from the selected parser
  • one protein-LFQ report with target kind, grouping mode, charge policy, aggregation method, sample identifiers, protein rows, and missingness summary
  • one per-protein row with pairwise-ratio counts, connected-component counts, fully-connected status, contributing peptides, and sample-specific LFQ values
  • any requested summary, matrix, pairwise-ratio, or missingness TSV output paths

protein-coverage emits one direct protein-coverage review packet over FDR-filtered PSM evidence plus a supplied FASTA sequence set.

  • the positional argument is the source PSM TSV path
  • --fasta is required and supplies the protein sequences used for coverage mapping
  • --threshold defaults to 0.05 and defines the accepted PSM evidence that feeds sequence coverage
  • canonical-schema and decoy-policy column controls stay available so lab-local PSM tables can be reviewed without a separate conversion pass
  • --summary-tsv-out writes one compact protein-coverage summary ledger
  • --coverage-tsv-out writes one reviewer-facing per-protein coverage table
  • --regions-tsv-out writes one covered-region ledger with explicit residue intervals

The protein-coverage JSON payload includes:

  • the selected threshold and score orientation
  • accepted and rejected source-row counts from the parsed PSM table
  • one summary block with total observed proteins, proteins with or without sequence, unique/shared-peptide burden, unmatched-peptide burden, total covered regions, total residues, and total covered residues
  • one protein row per sequence-backed protein with coverage fraction, covered residue count, covered regions, matched and unmatched peptides, unique/shared-peptide ledgers, score, q-value, target-decoy label, and contaminant flag
  • one flattened region row per contiguous covered interval
  • any requested summary, coverage-table, or region-ledger TSV output paths

protein-coverage-plot emits one plot-ready peptide-to-protein coverage packet plus optional static coverage renderings.

  • the positional argument is the source PSM TSV path
  • --fasta is required and supplies the protein sequences used for positional mapping
  • --threshold defaults to 0.05 and defines the accepted peptide evidence that feeds the plot surface
  • --high-q-value and --medium-q-value define the confidence bands used for plotted peptide labels
  • canonical-schema, optional modified-peptide, optional intensity, and decoy-policy column controls stay available so lab-local PSM tables can be plotted without a separate conversion pass
  • when every parsed PSM already carries a q-value, the plot surface preserves those imported q-values for confidence labeling and threshold filtering; otherwise it falls back to owned target-decoy filtering
  • --positions-tsv-out writes one reviewer-facing positional ledger
  • --svg-out writes one static SVG coverage plot
  • --html-out writes one static HTML wrapper around the same owned SVG view

The protein-coverage-plot JSON payload includes:

  • the selected threshold, score orientation, and confidence-band cutoffs
  • accepted and rejected source-row counts from the parsed PSM table
  • one summary block with plotted-protein count, total positional rows, modified/shared/intensity positional burden, and unmatched peptide count
  • one track per protein with coverage fraction, protein length, target-decoy label, contaminant flag, and ordered peptide-position rows
  • one peptide-position row per matched sequence occurrence with start/end residues, canonical peptide, modified peptide when present, peptide confidence, peptide q-value, best score, optional intensity, charge states, spectrum ids, and protein-group ids
  • explicit unmatched-peptide rows when one accepted peptide is assigned to one protein but cannot be located in the supplied sequence
  • any requested positional-ledger, SVG, or HTML output paths

protein-parsimony emits one direct review packet over a named parsimony protein set and the ambiguity that remains after selection.

  • the positional argument is the source PSM TSV path
  • --threshold defaults to 0.05 and defines the accepted PSM evidence that feeds protein inference
  • --variant selects the named parsimony policy used for the main selected set
  • --review-variant is repeatable and defines which named policies are compared for unresolved ambiguity review
  • canonical-schema and decoy-policy column controls stay available so lab-local PSM tables can be inferred without a separate conversion pass
  • --summary-tsv-out writes one compact parsimony summary ledger
  • --protein-tsv-out writes the selected-protein table
  • --ambiguity-tsv-out writes unresolved shared-peptide and variant-difference rows

The protein-parsimony JSON payload includes:

  • the selected threshold and score orientation
  • accepted and rejected source-row counts from the parsed PSM table
  • grouped-row count after FDR filtering
  • one summary block with observed peptide count, explained peptide count, unexplained peptide count, selected protein count, shared selected-peptide count, variant-difference count, and unresolved ambiguity count
  • one selected-protein row per chosen protein with source group, covered peptides, newly explained peptides, unresolved shared peptides, score, q-value, and target-decoy label
  • explicit unresolved ambiguity rows for shared peptides that still map to more than one selected protein and for parsimony variants that diverge in ranking or membership
  • any requested summary, protein, or ambiguity TSV output paths

openms-import emits one governed review packet over native OpenMS idXML identification evidence plus one exported feature table.

  • the positional argument is the OpenMS idXML path
  • --feature-table is required and supplies the exported feature-table path
  • --summary-tsv-out writes the one-row import summary
  • --psm-tsv-out writes reviewer-facing PSM rows
  • --protein-tsv-out writes reviewer-facing protein rows
  • --feature-tsv-out writes reviewer-facing feature rows

The openms-import JSON payload includes:

  • one compact summary over accepted PSM rows, protein rows, accepted and rejected feature rows, q-value coverage, target or decoy burden, and feature sample coverage
  • one explicit feature-parse summary with total, accepted, and rejected feature-table row counts
  • reviewer-facing PSM rows with run identity, spectrum reference, peptide sequence, charge, score, q-value, precursor m/z, retention time, protein references, and target-decoy label
  • reviewer-facing protein rows with run identity, protein reference, score, q-value, and target-decoy label
  • reviewer-facing feature rows with feature identity, sample identity, peptide text, canonical peptide, intensity, protein references, charge, m/z, retention time, and missing reason
  • any requested TSV export paths

modification-resolve checks one modification token against the built-in registry and any optional custom registry supplied at runtime.

  • --residue optionally asks for residue-compatibility review instead of name resolution alone.
  • --registry loads a JSON modification registry so local or institution-specific definitions stay explicit.
  • --out writes the resolution report as JSON.

The modification-resolve JSON payload includes:

  • the original query token and normalized token
  • whether the token resolved successfully
  • builtin, custom-registry, or unknown source classification
  • the resolved modification name and controlled identifier when known
  • static or variable application class
  • allowed site position and residue scope
  • monoisotopic and average mass deltas
  • optional residue query plus residue-allowed status
  • reviewer-facing issues for unknown tokens or residue mismatches

The digestion-oriented CLI surfaces share one protease contract:

  • built-in proteases include trypsin, lysc, gluc, argc, chymotrypsin, and aspn
  • --custom-protease accepts explicit rule fragments such as after=KR;block_next=P or before=D;block_previous=P
  • --custom-protease-name supplies the durable rule name recorded in outputs
  • custom protease rules cannot be combined with a second built-in protease name

spectrum-parse emits the full MGF parse contract plus a chunk-aware streaming profile for larger file review.

  • --chunk-size controls the chunk accounting used in the streaming profile.
  • --accepted-jsonl-out writes one accepted spectrum object per line.
  • --rejected-json-out writes the rejected-block ledger as JSON.

The spectrum-parse JSON payload includes:

  • the full parse report with accepted spectra and rejected blocks
  • the compact collection summary
  • the streaming profile with chunk size, spectrum count, chunk count, and first or last accepted spectrum identifiers
  • any accepted-spectrum JSONL or rejected-block JSON export paths

spectrum-stats keeps the lighter review surface for one accepted collection:

  • summary counts over accepted spectra and rejected blocks
  • per-spectrum TIC or base-peak metrics
  • a provenance manifest when --provenance-out is requested

spectrum-summary emits reviewer-facing run summary tables over one MGF or mzML input.

  • --kind accepts auto, mgf, or mzml
  • --summary-tsv-out writes the one-row run summary
  • --charge-tsv-out writes the precursor-charge distribution
  • --precursor-tsv-out writes the precursor-m/z distribution
  • --peak-count-tsv-out writes the peak-count distribution

The spectrum-summary JSON payload includes:

  • source kind
  • ms-level policy
  • total, rejected, MS1, MS2, and unknown-ms-level counts
  • retention-time minimum and maximum when available
  • charge, precursor-m/z, and peak-count distributions
  • any requested TSV export paths

spectrum-qc emits raw-spectrum run-QC ledgers over one MGF or mzML input.

  • --kind accepts auto, mgf, or mzml
  • --time-bin-seconds sets the MS/MS-count retention-time bin width
  • --summary-tsv-out writes the one-row run-QC summary
  • --msms-tsv-out writes the MS/MS-count-over-time table
  • --tic-tsv-out writes the TIC trace table
  • --bpc-tsv-out writes the BPC trace table
  • --charge-tsv-out writes the precursor-charge distribution
  • --precursor-intensity-tsv-out writes the precursor-intensity distribution
  • --flagged-tsv-out writes the empty and noisy spectrum table
  • --plot-out writes one plot-ready JSON payload for downstream rendering

The spectrum-qc JSON payload includes:

  • source kind and chromatogram source
  • total, rejected, and MS/MS spectrum counts
  • precursor-intensity observation count
  • empty-spectrum count and noisy-spectrum count
  • MS/MS-count-over-time bins
  • TIC and BPC traces
  • charge and precursor-intensity distributions
  • flagged spectrum rows
  • reviewer-facing diagnostics when retention-time or precursor-intensity evidence is incomplete
  • any requested TSV or plot export paths

spectrum-annotate emits one matched-fragment review object plus a plot-ready payload for one accepted MGF spectrum.

  • --peptide is required and accepts the owned peptide notation surface.
  • --spectrum-id optionally selects one accepted spectrum by identifier.
  • --tolerance-da and --tolerance-ppm select the fragment-match tolerance mode.
  • --tsv-out writes the matched-ion evidence table.
  • --plot-out writes the plot-ready JSON payload.

The spectrum-annotate JSON payload includes:

  • the full annotation object
  • explicit matched-ion rows
  • matched-peak count
  • explained-intensity fraction
  • unmatched-peak count
  • the selected tolerance unit plus tolerance value
  • ambiguity warnings when one fragment matches multiple peaks or one peak matches multiple fragments
  • one plot-ready payload with labeled peaks

spectrum-similarity compares one accepted query spectrum against either one selected reference spectrum or an accepted spectrum library from MGF or mzML.

  • --query-kind and --reference-kind accept auto, mgf, or mzml.
  • --query-spectrum-id optionally selects one accepted query spectrum by identifier.
  • --reference-spectrum-id switches the command into explicit pairwise comparison against one selected reference spectrum.
  • --method accepts cosine or dot_product.
  • --mode accepts raw, normalized, top_n, or transformed.
  • --tolerance-da enables direct fragment matching by mass tolerance.
  • --bin-width-da enables coarse m/z-binned comparison instead of tolerance-based matching.
  • --max-matches limits the ranked library output.
  • --tsv-out writes the ranked candidate table.

The spectrum-similarity JSON payload includes:

  • an optional pairwise comparison report when --reference-spectrum-id is supplied
  • a ranked library report for the selected query spectrum
  • the explicit preprocessing and matching parameters
  • matched-peak count plus explained-intensity fractions
  • reviewer-facing classifications such as duplicate_like, similar, distinct, or insufficient_signal
  • any requested TSV export path

spectral-library-import imports one practical MSP or library-shaped MGF file into an explicit peptide-aware library contract.

  • --kind accepts auto, msp, or mgf.
  • --precursor-mz optionally runs candidate retrieval against the imported precursor index.
  • --tolerance-da sets the precursor candidate window.
  • --peptide optionally narrows candidate retrieval to one peptide query.
  • --summary-tsv-out writes a compact one-row library summary.
  • --candidates-tsv-out writes the precursor-compatible candidate table and requires --precursor-mz.

The spectral-library-import JSON payload includes:

  • the full import report with accepted and rejected entries
  • a compact summary over entry count, unique peptides, modified entries, decoy entries, and charge distribution
  • compact index facts, including peptide lookup content and precursor-index size
  • an optional candidate report over precursor and peptide-filtered entries
  • any requested summary or candidate TSV export paths

spectral-library-search ranks one selected query spectrum against one practical MSP or MGF library.

  • --query-kind accepts auto, mgf, or mzml.
  • --library-kind accepts auto, msp, or mgf.
  • --query-spectrum-id optionally selects one accepted query spectrum by identifier.
  • --precursor-tolerance-da sets the precursor candidate window before any similarity scoring happens.
  • --tolerance-da sets the fragment-matching tolerance for the similarity stage.
  • --bin-width-da optionally switches the similarity stage onto coarse m/z binning.
  • --method accepts cosine or dot_product.
  • --mode accepts raw, normalized, top_n, or transformed.
  • --top-n optionally limits preprocessing to the most intense peaks.
  • --max-matches limits the ranked output table.
  • --tsv-out writes the ranked match table with target-decoy label, score, explained-intensity fractions, and optional q-value.

The spectral-library-search JSON payload includes:

  • the imported library report used for search
  • a compact library summary that keeps decoy-entry count explicit
  • one search report with precursor policy, similarity policy, candidate count, decoy-candidate count, and ranked matches
  • the top-match identifier, peptide, score, and q-value when available
  • a search strategy field that stays explicit as either concatenated or no_decoy_advisory
  • any requested ranked-TSV export path

mzml-inspect reports one practical mzML review object without claiming full vendor-native replacement.

  • --spectra-jsonl-out writes accepted spectra as normalized JSONL.
  • --chromatograms-json-out writes the extracted chromatogram report as JSON.
  • the command reports decoding support and chromatogram presence explicitly instead of assuming every mzML file is equally supported

The mzml-inspect JSON payload includes:

  • run metadata
  • the compact accepted-spectrum summary
  • binary decoding support with accepted and rejected spectrum counts
  • extracted chromatogram traces, including TIC and BPC when present
  • reviewer-facing diagnostics about practical scope and missing chromatograms
  • any accepted-spectrum JSONL or chromatogram JSON export paths

digest and peptide-index both report:

  • the resolved protease name
  • the custom protease specification when one was supplied
  • the digestion mode
  • the missed-cleavage allowance

digest exports one theoretical peptide database under the selected digestion policy.

  • --format accepts tsv, jsonl, parquet, or fasta.
  • --out writes the main peptide export.
  • --manifest-out writes the digestion policy manifest.
  • --peptide-protein-table-out optionally writes a peptide-to-protein TSV sidecar with one row per peptide occurrence.

The main peptide exports preserve:

  • source accession and source identifier
  • peptide sequence
  • peptide length
  • start and end coordinates
  • missed-cleavage count
  • protease and digestion mode
  • cleavage type
  • neutral mass

The peptide FASTA export writes the peptide sequence body and records source coordinates, missed-cleavage count, peptide length, neutral mass, and protease in the header.

The peptide-to-protein sidecar preserves:

  • peptide sequence
  • peptide length
  • neutral mass
  • source accession and source identifier
  • source protein family and isoform
  • start and end coordinates
  • missed-cleavage count
  • protease, digestion mode, and cleavage type

fasta-stats reports FASTA-wide review metrics such as duplicate accession count, duplicate sequence count, target count, decoy count, contaminant count, and sequence-length summary values.

summarize --kind fasta returns the higher-level FASTA summary, the parser-level database composition, and the richer FASTA profile so operators can distinguish structural file quality from biological database makeup and annotation burden.

For PSM evidence, the contaminant-review surface is:

  • psm-contaminants

psm-contaminants emits a separate contaminant-match report with:

  • contaminant PSM count
  • pure-contaminant versus mixed-reference PSM counts
  • contaminant peptide count
  • contaminant protein counts
  • row-level entries listing contaminant and target protein references for each contaminant-carrying match

summarize --kind psm now includes the same contaminant report alongside the standard PSM, peptide, and protein summaries.