CLI Surface¶
CLI documentation should describe the commands the package truly owns, not the commands a reader might wish existed.
Package Surface¶
src/bijux_proteomics/interfaces/cli/app.pyandinterfaces/cli/__main__.pyare the command-line surfaces for core contract workflows- CLI behavior should reveal contract meaning and validation state rather than runtime orchestration detail
- new CLI promises must stay aligned with the stable contract model
First Proof Check¶
src/bijux_proteomics/domain/program_spec.py,domain/repositories.py, anddomain/targets.pysrc/bijux_proteomics/interfaces/cli/app.pyandinterfaces/cli/__main__.pypackages/bijux-proteomics-core/tests
FASTA Commands¶
The owned FASTA CLI surface is:
fasta-parsefasta-contaminantsfasta-profilefasta-statsfasta-dedupfasta-filterfasta-provenancefasta-decoytarget-decoy-validatepeptide-indexfragment-ionspeptide-propertiesprecursor-mass-errormodified-peptide-parsemodification-resolvepsm-mappeptide-evidencefdr-reference-checkfdr-levelspicked-protein-fdrprotein-ambiguityprotein-inference-benchmarkspeptide-matrixprotein-matrixprotein-lfqprotein-groupsprotein-coverageprotein-coverage-plotprotein-parsimonymaxquant-importdiann-importspectronaut-importopenms-importcomet-importfragpipe-importsage-importsummarize --kind fasta
The owned spectrum CLI surface is:
spectrum-parsespectrum-statsspectrum-annotatespectrum-similarityspectral-library-importspectral-library-searchspectrum-summarymzml-inspectsummarize --kind mgf
fasta-parse emits the full parser report, including:
- accepted and rejected records
- duplicate identifiers
- duplicate normalized accessions
- parser-level database composition over accepted records
The database composition surface reports:
- accepted record count
- target count
- decoy count
- contaminant count
- accession-namespace counts
fasta-profile emits a richer database-review object with:
- a summary block covering input records, accepted proteins, rejected records, unique accessions, target count, decoy count, contaminant count, total residues, length extremes, and organism annotation coverage
- a stable length-distribution ledger across the bins
1-99,100-249,250-499,500-999, and1000+ - an organism-distribution ledger when organism evidence is present in the accepted records
fasta-profile also supports reviewer-facing TSV exports through:
--summary-tsv-out--length-tsv-out--organism-tsv-out
fasta-contaminants builds a more realistic search database by:
- appending the owned built-in contaminant panel unless
--no-include-builtinis selected - appending one or more user-provided contaminant FASTA files through repeated
--contaminant-fasta - relabeling appended contaminant proteins with the stable
CON__prefix - writing a build report with separate built-in and external append counts plus skipped duplicate contaminant accessions
fasta-decoy builds a target-decoy database and reports both accession-level
and sequence-level review signals.
--decoy-mode reverseand--decoy-mode shuffleselect the owned decoy construction method.--prefixpreserves target protein identity inside the decoy accession while enforcing collision-free accession generation.- Mixed target-plus-decoy inputs are rejected instead of being re-expanded.
- Prefix choices that would collide with existing target accessions fail before output is written.
The fasta-decoy JSON payload includes:
modeprefixseedoutput_fastatarget_countdecoy_countreportgeneration_reportoutput_sha256reproducibility_hash
generation_report adds reviewer-facing target-decoy construction details:
- input target count
- generated decoy count
- unchanged sequence count and accession list
- target-sequence collision count and accession list
- validity flag for the generated decoy surface
target-decoy-validate checks a finished database after generation and reports:
- target and decoy counts
- prefix and mode compatibility
- duplicate accession and duplicate sequence burden
- target-versus-decoy sequence overlap signals
- overall validity of the target-decoy database
peptide-index digests a FASTA database and reports how one or more peptide
queries map back to proteins under the selected digestion assumptions.
--peptideis repeatable and accepts plain or modified peptide notation.--protease,--missed-cleavages, and--digestion-modedefine the digest policy used to build the searchable peptide space.--il-equivalentoptionally collapses isoleucine and leucine during lookup.--protein-group-mapaccepts a TSV withaccessionandprotein_groupcolumns so group-specific peptides stay explicit.
The peptide-index JSON payload includes:
- input record count
- query peptide count
- protease
- digestion mode
- missed cleavages
- I/L-equivalence flag
- protein-group-map presence flag
- one report object with per-peptide lookup entries and summary counts
Each lookup entry reports:
- the original query peptide
- the canonical residue sequence used for lookup
- the final lookup sequence after optional I/L normalization
- whether modification stripping or I/L-equivalent lookup was applied
- matched protein accessions, families, and groups
- protein-group count
- uniqueness and audit class when the peptide is present
- target, decoy, contaminant, mixed, or missing database membership
- missed-cleavage counts observed among the matching peptide instances
- a reviewer-facing explanation string
peptide-properties reports one peptide-level screening object for filtering
or review before search and downstream analysis.
--modaccepts repeatable modification assignments in the same style aspeptide-mass.--chargechooses the precursor charge state used for m/z calculation.--protease,--custom-protease, and--custom-protease-namedefine the missed-cleavage context.--registryoptionally loads a modification registry for named modifications.
The peptide-properties JSON payload includes:
- canonical notation
- underlying residue sequence
- protease
- charge
- residue length
- monoisotopic mass
- average mass
- monoisotopic precursor m/z for the selected charge state
- missed-cleavage count
- hydrophobicity proxy
- problem flags and a final problematic or not flag
fragment-ions emits a dedicated theoretical fragment-ion review report for
one peptide or modified peptide.
--modaccepts repeatable modification assignments in the same style aspeptide-mass.--chargeis repeatable and defaults to both1and2.--fragment-seriesaccepts the owned series labels and defaults tobplusy.--include-neutral-lossesadds supported residue and modification losses.--tsv-outwrites one row per theoretical fragment ion.
The fragment-ions JSON payload includes:
- canonical notation and residue sequence
- selected charge states and fragment series
- whether neutral losses were included
- total fragment-ion count
- counts by series
- counts by charge
- neutral-loss ion count
- the full fragment-ion rows with series, ordinal, charge, neutral loss, and monoisotopic or average mass and m/z values
precursor-mass-error emits a reviewer-facing precursor calibration report from
one TSV table of peptide, observed-m/z, and charge observations.
--peptide-column,--observed-mz-column,--charge-column, and--spectrum-id-columnmap the input table.--max-isotope-offsetcontrols how many isotope-offset candidates are ranked.--summary-tsv-outwrites the one-row report summary.--observations-tsv-outwrites one row per peptide observation.--ppm-distribution-tsv-outwrites the absolute-ppm distribution.--charge-distribution-tsv-outwrites the charge-state distribution.--isotope-distribution-tsv-outwrites the recommended isotope-offset distribution.
The precursor-mass-error JSON payload includes:
- input row count and accepted observation count
- mean and median ppm error plus mean Da error
- median and maximum absolute ppm error
- charge, absolute-ppm, and recommended isotope-offset distributions
- per-observation peptide, canonical peptide, charge, theoretical m/z, observed m/z, Da error, ppm error, and isotope advisory
- any requested TSV export paths
modified-peptide-parse normalizes one engine-specific modified peptide string
into the owned canonical modified peptide contract.
--dialectis required and acceptsmaxquant,msfragger,fragpipe,sage, orcomet.--registryoptionally loads a modification registry for named modifications.--outwrites the normalization report as JSON.
The modified-peptide-parse JSON payload includes:
- the named engine dialect
- the original notation
- the stripped residue sequence
- the canonical modified peptide notation
- explicit protein-terminal context flags
- the full normalized modification rows with preserved site positions
fragpipe-import emits one governed review packet over a FragPipe psm.tsv,
peptide table, and protein table bundle.
- the positional argument is the FragPipe
psm.tsvpath --peptide-tsvis required and supplies the peptide-level table--protein-tsvis required and supplies the protein-level table--summary-tsv-outwrites the one-row bundle summary--psm-tsv-outwrites reviewer-facing PSM rows with modification and mass-difference evidence--peptide-review-tsv-outwrites the peptide-level review table--protein-review-tsv-outwrites the protein-level review table
The fragpipe-import JSON payload includes:
- one compact bundle summary over accepted PSMs, peptide rows, protein rows, q-value coverage, modified rows, open-search-like rows, and mapped proteins
- the PSM normalization adapter identity plus accepted and rejected PSM counts
- reviewer-facing PSM rows with hyperscore, q-value, target-decoy state, protein references, modification evidence, and mass-difference state
- reviewer-facing peptide rows with mapped proteins, probability, q-value, spectral count, modification evidence, and mass-difference state
- reviewer-facing protein rows with identity, annotation, coverage, peptide burden, spectral count, probability, and target-decoy state
- any requested TSV export paths
sage-import emits one governed review packet over a realistic Sage PSM
export.
- the positional argument is the Sage result TSV path
--configoptionally loads a Sage search configuration JSON file--summary-tsv-outwrites the one-row import summary--psm-tsv-outwrites reviewer-facing Sage PSM rows
The sage-import JSON payload includes:
- the detected Sage dialect identifier
- one compact summary over accepted or rejected rows, modified PSMs, hyperscore coverage, q-value coverage, multi-protein rows, and target or decoy burden
- the normalization adapter identity plus accepted and rejected row counts
- an optional parsed Sage parameter report when
--configis supplied - reviewer-facing PSM rows with discriminant score, hyperscore, q-values, posterior error, protein mappings, modification burden, matched-peak shape, and mass-accuracy fields
- any requested TSV export paths
comet-import emits one governed review packet over practical Comet tabular
or pepXML result evidence.
- the positional argument is the Comet result file path
--configoptionally loads a Comet parameter file--summary-tsv-outwrites the one-row import summary--psm-tsv-outwrites reviewer-facing Comet PSM rows
The comet-import JSON payload includes:
- the detected import kind as
tabularorpepxml - one compact summary over accepted or rejected rows, modified PSMs, XCorr coverage, DeltaCn coverage, expectation-value coverage, multi-protein rows, and target or decoy burden
- the normalization adapter identity plus accepted and rejected row counts for tabular imports
- an optional parsed Comet parameter report when
--configis supplied - reviewer-facing PSM rows with modified peptide notation, residue sequence, canonical peptide, charge, expectation value, XCorr, DeltaCn, Sp score, protein mappings, and target-decoy label
- any requested TSV export paths
maxquant-import emits one governed review packet over a MaxQuant
evidence.txt, peptides.txt, and proteinGroups.txt bundle.
- the positional argument is the MaxQuant
evidence.txtpath --peptides-txtis required and supplies thepeptides.txttable--protein-groups-txtis required and supplies theproteinGroups.txttable--configoptionally loads a MaxQuant settings file--summary-tsv-outwrites the one-row import summary--evidence-tsv-outwrites reviewer-facing evidence rows--peptide-tsv-outwrites reviewer-facing peptide rows--protein-group-tsv-outwrites reviewer-facing protein-group rows
The maxquant-import JSON payload includes:
- one compact summary over accepted and rejected evidence rows, peptide and protein-group row counts, modified evidence burden, experiment names, LFQ experiment names, and contaminant or reverse counts across the bundle
- the evidence normalization adapter identity plus accepted and rejected row counts for the native MaxQuant evidence surface
- an optional parsed MaxQuant parameter report when
--configis supplied - reviewer-facing evidence rows with experiment name, modified peptide notation, residue sequence, canonical peptide, charge, score, posterior error probability, protein mappings, and contaminant or reverse flags
- reviewer-facing peptide rows with modified sequence, leading razor protein, protein mappings, score, posterior error probability, intensity, MS/MS count, and contaminant or reverse flags
- reviewer-facing protein-group rows with protein identities, peptide burden, sequence coverage, only-identified-by-site state, per-experiment LFQ intensities, and contaminant or reverse flags
- any requested TSV export paths
diann-import emits one governed review packet over a DIA-NN precursor report.
- the positional argument is the DIA-NN report TSV path
--configoptionally loads a DIA-NN configuration JSON file--summary-tsv-outwrites the one-row import summary--precursor-tsv-outwrites reviewer-facing precursor rows--protein-group-tsv-outwrites reviewer-facing protein-group rows
The diann-import JSON payload includes:
- one compact summary over accepted and rejected precursor rows, protein-group
row count, run names, sample names,
Precursor.Quantitycoverage,PG.Quantitycoverage, and target or decoy precursor burden - the DIA-NN normalization adapter identity plus accepted and rejected row counts for the source report
- an optional parsed DIA-NN parameter report when
--configis supplied - reviewer-facing precursor rows with precursor identifier, peptide sequence,
canonical peptide, charge, q-value,
Protein.Group,Protein.Ids, run, sample,Precursor.Quantity,PG.Quantity, and target-decoy label - reviewer-facing protein-group rows with protein-group identity, protein
references, run, sample, q-value,
PG.Quantity, source precursor count, and target-decoy label - the derived DIA-native precursor and protein-group quantity import report
- any requested TSV export paths
spectronaut-import emits one governed review packet over a Spectronaut
precursor export table.
- the positional argument is the Spectronaut report TSV path
--configoptionally loads a Spectronaut settings file--summary-tsv-outwrites the one-row import summary--precursor-tsv-outwrites reviewer-facing precursor rows--protein-group-tsv-outwrites reviewer-facing protein-group rows
The spectronaut-import JSON payload includes:
- one compact summary over accepted and rejected precursor rows, protein-group
row count, modified precursor count, sample names, run names,
FG.Quantitycoverage,PG.Quantitycoverage, and target or decoy precursor burden - the Spectronaut normalization adapter identity plus accepted and rejected row counts for the source report
- an optional parsed Spectronaut parameter report when
--configis supplied - reviewer-facing precursor rows with precursor identifier, stripped peptide,
modified peptide, canonical modified peptide, charge, confidence score,
q-value, protein group, protein accessions, run, sample,
FG.Quantity,PG.Quantity, and target-decoy label - reviewer-facing protein-group rows with protein-group identity, protein
references, run, sample, q-value,
PG.Quantity, source precursor count, and target-decoy label - any requested TSV export paths
psm-map emits one governed generic-mapping report for a lab-local PSM table.
- the positional argument is the source PSM TSV path
--mappingis required and accepts one YAML or JSON column-map file--normalized-tsv-outwrites the normalized mapped PSM table
psm-inspect also accepts explicit canonical-schema column controls when a
lab-local TSV needs direct inspection without a separate mapping document.
--run-id-columnmaps one run-identity column when the source export carries repeated scan identifiers across runs--modified-peptide-columnmaps one source column that carries modified peptide notation separate from the stripped peptide column--contaminant-label-columnmaps one explicit contaminant-state column when contaminant status is supplied directly instead of inferred only from protein references--proteasedefines the missed-cleavage policy used in the inspection report--summary-tsv-out,--score-distribution-tsv-out,--q-value-distribution-tsv-out,--charge-distribution-tsv-out,--peptide-length-distribution-tsv-out, and--missed-cleavage-distribution-tsv-outwrite reviewer-facing inspection ledgers
The psm-map JSON payload includes:
- the validated column map used for normalization
- the observed source columns from the input table
- one compact summary over total, accepted, and rejected rows, mapped run coverage, q-value coverage, protein-reference coverage, and unmapped source columns
- rejected rows with stable issue details
- mapped rows with run identity, spectrum identity, residue-only peptide sequence, peptide text, canonical modified peptide when present, charge, score, q-value, protein references, target-decoy label, and contaminant flag
- any requested normalized TSV output path
psm-inspect emits one direct quality-inspection packet over the parsed PSM
table.
- accepted and rejected row counts remain explicit at the top level
inspectionadds total, accepted, and rejected row counts together with the named protease used for missed-cleavage reviewinspection.score_distributionreports accepted PSM counts across stable score binsinspection.q_value_distributionreports accepted PSM counts across stable q-value buckets plus missing-q-value rows when presentinspection.charge_distributionreports accepted PSM counts by charge stateinspection.peptide_length_distributionreports accepted PSM counts across stable peptide-length bucketsinspection.missed_cleavage_distributionreports accepted PSM counts by missed-cleavage burden under the selected protease- any requested summary or distribution TSV outputs are reported under
outputs
peptide-evidence emits one direct peptide-evidence review packet over the
parsed PSM table after peptide-level rollup and peptide-level FDR review.
- the positional argument is the source PSM TSV path
--thresholddefaults to0.05and defines the peptide-level FDR threshold--strong-q-valuedefaults to0.01and defines the stricter threshold for thestrongprimary class- canonical-schema and decoy-policy column controls stay available so lab-local PSM tables can be reviewed without a separate conversion pass
--summary-tsv-outwrites one compact peptide-evidence summary ledger--entries-tsv-outwrites one per-peptide evidence review ledger
The peptide-evidence JSON payload includes:
- the selected threshold, score orientation, and strong-evidence q-value
- accepted and rejected source-row counts from the parsed PSM table
- one summary block with total, accepted, rejected, strong, weak, unique, shared, modified, contaminant, and decoy peptide counts
- one peptide review row per canonical peptide with primary class, orthogonal tags, peptide-level q-value, acceptance state, counts, protein references, target-decoy label, contaminant flag, and explanation
- any requested summary or entry TSV output paths
fdr-reference-check validates curated target-decoy reference cases against
the owned FDR implementation.
- the positional argument is one JSON file containing a list of curated reference cases
--summary-tsv-outwrites one case-level validation summary table--entries-tsv-outwrites one ranked entry-level validation table
The fdr-reference-check JSON payload includes:
- overall validity across all curated cases
- case and entry counts plus total failed-entry count
- one case report per curated reference with score orientation, tie handling, optional threshold, reproducibility hash, and q-value monotonicity status
- one ranked validation row per expected entry with expected-versus-observed cumulative counts, FDR, q-value, acceptance state, and explicit mismatch fields
- any requested summary or entry TSV output paths
fdr-levels compares accepted PSM, peptide, and protein evidence across
explicit FDR thresholds.
- the positional argument is the source PSM TSV path
--thresholdcan be repeated and defaults to0.01,0.05, and0.1--summary-tsv-outwrites one threshold-by-level summary table--entries-tsv-outwrites one accepted-entity ledger across thresholds
The fdr-levels JSON payload includes:
- the selected score orientation and ordered thresholds
- accepted and rejected source-row counts from the parsed PSM table
- one threshold summary per evidence level with total and accepted counts for target, decoy, mixed, unknown, and contaminant burden
- one accepted-entry row per threshold and evidence level with entity identity, q-value, rank, member count, target-decoy label, contaminant flag, and protein references
- any requested summary or entry TSV output paths
picked-protein-fdr compares target-versus-decoy protein competition across
explicit picked-protein thresholds.
- the positional argument is the source PSM TSV path
--thresholdcan be repeated and defaults to0.01,0.05, and0.1- canonical-schema and decoy-policy column controls stay available so lab-local PSM tables can be reviewed without a separate conversion pass
--summary-tsv-outwrites one threshold-level competition summary table--entries-tsv-outwrites one picked-protein ledger across thresholds
The picked-protein-fdr JSON payload includes:
- the selected score orientation and ordered thresholds
- accepted and rejected source-row counts from the parsed PSM table
- one threshold summary with total and accepted target, decoy, contaminant, and grouped-protein burden
- one picked-protein review row per threshold with protein identity, target-versus-decoy partner identity, protein-group identifiers, score, q-value, FDR, rank, acceptance state, contaminant flag, and supporting peptides
- any requested summary or entry TSV output paths
protein-groups emits one direct protein-grouping review packet over
FDR-filtered PSM evidence.
- the positional argument is the source PSM TSV path
--thresholddefaults to0.05and defines the accepted PSM evidence that feeds grouping- canonical-schema and decoy-policy column controls stay available so lab-local PSM tables can be grouped without a separate conversion pass
--summary-tsv-outwrites one compact protein-grouping summary ledger--group-tsv-outwrites the reviewer-facing protein group table
The protein-groups JSON payload includes:
- the selected threshold and score orientation
- accepted and rejected source-row counts from the parsed PSM table
- grouped-row count after FDR filtering
- one summary block with total groups, singleton groups, ambiguous groups, grouped proteins, target/decoy/mixed/unknown burden, and contaminant burden
- one protein-group row per group with representative protein, leading protein, leading rationale, protein members, all peptides, unique peptides, shared peptides, score, q-value, target-decoy label, and contaminant flag
- any requested summary or group-table TSV output paths
protein-ambiguity emits one direct protein-ambiguity review packet over
FDR-filtered PSM evidence.
- the positional argument is the source PSM TSV path
--thresholddefaults to0.05and defines the accepted PSM evidence that feeds ambiguity review--high-q-valueand--medium-q-valuedefine the confidence bands applied to each ambiguous group- canonical-schema and decoy-policy column controls stay available so lab-local PSM tables can be reviewed without a separate conversion pass
--summary-tsv-outwrites one compact protein-ambiguity summary ledger--ambiguity-tsv-outwrites one reviewer-facing ambiguity table
The protein-ambiguity JSON payload includes:
- the selected threshold, score orientation, and confidence-band cutoffs
- accepted and rejected source-row counts from the parsed PSM table
- grouped-row count after FDR filtering and ambiguity-row count after the review surface isolates unresolved groups
- one summary block with ambiguous-group count, ambiguous-protein count, indistinguishable versus external-shared versus mixed-group burden, and confidence-label counts
- one ambiguity row per unresolved protein group with representative protein, all protein members, indistinguishable-member ledger, shared peptides, unique peptides, outside-group proteins, ambiguity reason, ambiguity explanation, score, q-value, confidence label, target-decoy label, and contaminant flag
- any requested summary or ambiguity-table TSV output paths
protein-inference-benchmarks emits one owned benchmark review packet over the
named protein-inference pressure catalog.
- there is no positional input; the command runs the repository-owned benchmark scenarios directly
--picked-thresholddefaults to0.05and defines the protein threshold used by the picked strategy inside the benchmark suite--summary-tsv-outwrites one compact suite-summary ledger--scenarios-tsv-outwrites one benchmark-scenario ledger--assessments-tsv-outwrites one strategy-assessment ledger across every scenario
The protein-inference-benchmarks JSON payload includes:
- the selected picked threshold
- scenario count together with explicit shared-peptide, isoform, homolog-family, contaminant, and decoy case counts
- worst strategy precision and recall lower bounds across the suite
- covered inference-strategy kinds
- one benchmark report per named scenario with expected-present and expected-absent proteins, pressure flags, disagreement count, and one strategy assessment per inference method
- any requested summary, scenario, or assessment TSV output paths
peptide-matrix emits one owned peptide-by-sample intensity matrix over either
precursor or feature evidence or intensity-bearing PSM evidence.
--input-kindacceptsfeatureorpsm--grouping-modeacceptspeptide_sequenceormodified_peptide--separate-charge-stateskeeps precursor charge states split into separate peptide rows--aggregationaccepts the owned sum, median, or top-npolicies--summary-tsv-outwrites one compact matrix-summary ledger--matrix-tsv-outwrites one wide peptide-by-sample abundance matrix--missingness-tsv-outwrites one per-sample missingness ledger
The peptide-matrix JSON payload includes:
- input kind
- accepted and rejected source-record counts from the selected parser
- one peptide-matrix report with grouping mode, charge policy, aggregation method, sample identifiers, peptide rows, and missingness summary
- report-level counts for accepted and skipped source rows after matrix construction
- any requested summary, matrix, or missingness TSV output paths
protein-matrix emits one owned protein-by-sample intensity matrix over either
precursor or feature evidence or intensity-bearing PSM evidence.
--input-kindacceptsfeatureorpsm--grouping-modeacceptspeptide_sequenceormodified_peptide--target-kindacceptsproteinorprotein_group--aggregationaccepts the owned sum, median, or top-npolicies--unique-peptide-onlyexcludes shared-peptide rows before protein rollup--summary-tsv-outwrites one compact protein-matrix summary ledger--matrix-tsv-outwrites one wide protein-by-sample abundance matrix--missingness-tsv-outwrites one per-sample missingness ledger
The protein-matrix JSON payload includes:
- input kind
- accepted and rejected source-record counts from the selected parser
- one protein-matrix report with target kind, rollup policy, peptide counts, unique/shared peptide burden, sample identifiers, and missingness summary
- any requested summary, matrix, or missingness TSV output paths
protein-lfq emits one owned MaxLFQ-like protein abundance matrix over either
precursor or feature evidence or intensity-bearing PSM evidence.
--input-kindacceptsfeatureorpsm--grouping-modeacceptspeptide_sequenceormodified_peptide--target-kindacceptsproteinorprotein_group--aggregationaccepts the owned sum, median, or top-npeptide-collapse policies used before pairwise ratio construction--unique-peptide-onlyexcludes shared-peptide rows before LFQ solving--minimum-shared-peptidesrequires a minimum number of shared peptides before one sample-pair ratio is retained--summary-tsv-outwrites one compact protein-LFQ summary ledger--matrix-tsv-outwrites one wide protein-by-sample LFQ abundance matrix--pairwise-tsv-outwrites one reviewer-facing pairwise-ratio ledger--missingness-tsv-outwrites one per-sample missingness ledger
The protein-lfq JSON payload includes:
- input kind
- accepted and rejected source-record counts from the selected parser
- one protein-LFQ report with target kind, grouping mode, charge policy, aggregation method, sample identifiers, protein rows, and missingness summary
- one per-protein row with pairwise-ratio counts, connected-component counts, fully-connected status, contributing peptides, and sample-specific LFQ values
- any requested summary, matrix, pairwise-ratio, or missingness TSV output paths
protein-coverage emits one direct protein-coverage review packet over
FDR-filtered PSM evidence plus a supplied FASTA sequence set.
- the positional argument is the source PSM TSV path
--fastais required and supplies the protein sequences used for coverage mapping--thresholddefaults to0.05and defines the accepted PSM evidence that feeds sequence coverage- canonical-schema and decoy-policy column controls stay available so lab-local PSM tables can be reviewed without a separate conversion pass
--summary-tsv-outwrites one compact protein-coverage summary ledger--coverage-tsv-outwrites one reviewer-facing per-protein coverage table--regions-tsv-outwrites one covered-region ledger with explicit residue intervals
The protein-coverage JSON payload includes:
- the selected threshold and score orientation
- accepted and rejected source-row counts from the parsed PSM table
- one summary block with total observed proteins, proteins with or without sequence, unique/shared-peptide burden, unmatched-peptide burden, total covered regions, total residues, and total covered residues
- one protein row per sequence-backed protein with coverage fraction, covered residue count, covered regions, matched and unmatched peptides, unique/shared-peptide ledgers, score, q-value, target-decoy label, and contaminant flag
- one flattened region row per contiguous covered interval
- any requested summary, coverage-table, or region-ledger TSV output paths
protein-coverage-plot emits one plot-ready peptide-to-protein coverage packet
plus optional static coverage renderings.
- the positional argument is the source PSM TSV path
--fastais required and supplies the protein sequences used for positional mapping--thresholddefaults to0.05and defines the accepted peptide evidence that feeds the plot surface--high-q-valueand--medium-q-valuedefine the confidence bands used for plotted peptide labels- canonical-schema, optional modified-peptide, optional intensity, and decoy-policy column controls stay available so lab-local PSM tables can be plotted without a separate conversion pass
- when every parsed PSM already carries a q-value, the plot surface preserves those imported q-values for confidence labeling and threshold filtering; otherwise it falls back to owned target-decoy filtering
--positions-tsv-outwrites one reviewer-facing positional ledger--svg-outwrites one static SVG coverage plot--html-outwrites one static HTML wrapper around the same owned SVG view
The protein-coverage-plot JSON payload includes:
- the selected threshold, score orientation, and confidence-band cutoffs
- accepted and rejected source-row counts from the parsed PSM table
- one summary block with plotted-protein count, total positional rows, modified/shared/intensity positional burden, and unmatched peptide count
- one track per protein with coverage fraction, protein length, target-decoy label, contaminant flag, and ordered peptide-position rows
- one peptide-position row per matched sequence occurrence with start/end residues, canonical peptide, modified peptide when present, peptide confidence, peptide q-value, best score, optional intensity, charge states, spectrum ids, and protein-group ids
- explicit unmatched-peptide rows when one accepted peptide is assigned to one protein but cannot be located in the supplied sequence
- any requested positional-ledger, SVG, or HTML output paths
protein-parsimony emits one direct review packet over a named parsimony
protein set and the ambiguity that remains after selection.
- the positional argument is the source PSM TSV path
--thresholddefaults to0.05and defines the accepted PSM evidence that feeds protein inference--variantselects the named parsimony policy used for the main selected set--review-variantis repeatable and defines which named policies are compared for unresolved ambiguity review- canonical-schema and decoy-policy column controls stay available so lab-local PSM tables can be inferred without a separate conversion pass
--summary-tsv-outwrites one compact parsimony summary ledger--protein-tsv-outwrites the selected-protein table--ambiguity-tsv-outwrites unresolved shared-peptide and variant-difference rows
The protein-parsimony JSON payload includes:
- the selected threshold and score orientation
- accepted and rejected source-row counts from the parsed PSM table
- grouped-row count after FDR filtering
- one summary block with observed peptide count, explained peptide count, unexplained peptide count, selected protein count, shared selected-peptide count, variant-difference count, and unresolved ambiguity count
- one selected-protein row per chosen protein with source group, covered peptides, newly explained peptides, unresolved shared peptides, score, q-value, and target-decoy label
- explicit unresolved ambiguity rows for shared peptides that still map to more than one selected protein and for parsimony variants that diverge in ranking or membership
- any requested summary, protein, or ambiguity TSV output paths
openms-import emits one governed review packet over native OpenMS idXML
identification evidence plus one exported feature table.
- the positional argument is the OpenMS
idXMLpath --feature-tableis required and supplies the exported feature-table path--summary-tsv-outwrites the one-row import summary--psm-tsv-outwrites reviewer-facing PSM rows--protein-tsv-outwrites reviewer-facing protein rows--feature-tsv-outwrites reviewer-facing feature rows
The openms-import JSON payload includes:
- one compact summary over accepted PSM rows, protein rows, accepted and rejected feature rows, q-value coverage, target or decoy burden, and feature sample coverage
- one explicit feature-parse summary with total, accepted, and rejected feature-table row counts
- reviewer-facing PSM rows with run identity, spectrum reference, peptide sequence, charge, score, q-value, precursor m/z, retention time, protein references, and target-decoy label
- reviewer-facing protein rows with run identity, protein reference, score, q-value, and target-decoy label
- reviewer-facing feature rows with feature identity, sample identity, peptide text, canonical peptide, intensity, protein references, charge, m/z, retention time, and missing reason
- any requested TSV export paths
modification-resolve checks one modification token against the built-in
registry and any optional custom registry supplied at runtime.
--residueoptionally asks for residue-compatibility review instead of name resolution alone.--registryloads a JSON modification registry so local or institution-specific definitions stay explicit.--outwrites the resolution report as JSON.
The modification-resolve JSON payload includes:
- the original query token and normalized token
- whether the token resolved successfully
- builtin, custom-registry, or unknown source classification
- the resolved modification name and controlled identifier when known
- static or variable application class
- allowed site position and residue scope
- monoisotopic and average mass deltas
- optional residue query plus residue-allowed status
- reviewer-facing issues for unknown tokens or residue mismatches
The digestion-oriented CLI surfaces share one protease contract:
- built-in proteases include
trypsin,lysc,gluc,argc,chymotrypsin, andaspn --custom-proteaseaccepts explicit rule fragments such asafter=KR;block_next=Porbefore=D;block_previous=P--custom-protease-namesupplies the durable rule name recorded in outputs- custom protease rules cannot be combined with a second built-in protease name
spectrum-parse emits the full MGF parse contract plus a chunk-aware streaming
profile for larger file review.
--chunk-sizecontrols the chunk accounting used in the streaming profile.--accepted-jsonl-outwrites one accepted spectrum object per line.--rejected-json-outwrites the rejected-block ledger as JSON.
The spectrum-parse JSON payload includes:
- the full parse report with accepted spectra and rejected blocks
- the compact collection summary
- the streaming profile with chunk size, spectrum count, chunk count, and first or last accepted spectrum identifiers
- any accepted-spectrum JSONL or rejected-block JSON export paths
spectrum-stats keeps the lighter review surface for one accepted collection:
- summary counts over accepted spectra and rejected blocks
- per-spectrum TIC or base-peak metrics
- a provenance manifest when
--provenance-outis requested
spectrum-summary emits reviewer-facing run summary tables over one MGF or
mzML input.
--kindacceptsauto,mgf, ormzml--summary-tsv-outwrites the one-row run summary--charge-tsv-outwrites the precursor-charge distribution--precursor-tsv-outwrites the precursor-m/z distribution--peak-count-tsv-outwrites the peak-count distribution
The spectrum-summary JSON payload includes:
- source kind
- ms-level policy
- total, rejected, MS1, MS2, and unknown-ms-level counts
- retention-time minimum and maximum when available
- charge, precursor-m/z, and peak-count distributions
- any requested TSV export paths
spectrum-qc emits raw-spectrum run-QC ledgers over one MGF or mzML input.
--kindacceptsauto,mgf, ormzml--time-bin-secondssets the MS/MS-count retention-time bin width--summary-tsv-outwrites the one-row run-QC summary--msms-tsv-outwrites the MS/MS-count-over-time table--tic-tsv-outwrites the TIC trace table--bpc-tsv-outwrites the BPC trace table--charge-tsv-outwrites the precursor-charge distribution--precursor-intensity-tsv-outwrites the precursor-intensity distribution--flagged-tsv-outwrites the empty and noisy spectrum table--plot-outwrites one plot-ready JSON payload for downstream rendering
The spectrum-qc JSON payload includes:
- source kind and chromatogram source
- total, rejected, and MS/MS spectrum counts
- precursor-intensity observation count
- empty-spectrum count and noisy-spectrum count
- MS/MS-count-over-time bins
- TIC and BPC traces
- charge and precursor-intensity distributions
- flagged spectrum rows
- reviewer-facing diagnostics when retention-time or precursor-intensity evidence is incomplete
- any requested TSV or plot export paths
spectrum-annotate emits one matched-fragment review object plus a plot-ready
payload for one accepted MGF spectrum.
--peptideis required and accepts the owned peptide notation surface.--spectrum-idoptionally selects one accepted spectrum by identifier.--tolerance-daand--tolerance-ppmselect the fragment-match tolerance mode.--tsv-outwrites the matched-ion evidence table.--plot-outwrites the plot-ready JSON payload.
The spectrum-annotate JSON payload includes:
- the full annotation object
- explicit matched-ion rows
- matched-peak count
- explained-intensity fraction
- unmatched-peak count
- the selected tolerance unit plus tolerance value
- ambiguity warnings when one fragment matches multiple peaks or one peak matches multiple fragments
- one plot-ready payload with labeled peaks
spectrum-similarity compares one accepted query spectrum against either one
selected reference spectrum or an accepted spectrum library from MGF or mzML.
--query-kindand--reference-kindacceptauto,mgf, ormzml.--query-spectrum-idoptionally selects one accepted query spectrum by identifier.--reference-spectrum-idswitches the command into explicit pairwise comparison against one selected reference spectrum.--methodacceptscosineordot_product.--modeacceptsraw,normalized,top_n, ortransformed.--tolerance-daenables direct fragment matching by mass tolerance.--bin-width-daenables coarse m/z-binned comparison instead of tolerance-based matching.--max-matcheslimits the ranked library output.--tsv-outwrites the ranked candidate table.
The spectrum-similarity JSON payload includes:
- an optional pairwise comparison report when
--reference-spectrum-idis supplied - a ranked library report for the selected query spectrum
- the explicit preprocessing and matching parameters
- matched-peak count plus explained-intensity fractions
- reviewer-facing classifications such as
duplicate_like,similar,distinct, orinsufficient_signal - any requested TSV export path
spectral-library-import imports one practical MSP or library-shaped MGF file
into an explicit peptide-aware library contract.
--kindacceptsauto,msp, ormgf.--precursor-mzoptionally runs candidate retrieval against the imported precursor index.--tolerance-dasets the precursor candidate window.--peptideoptionally narrows candidate retrieval to one peptide query.--summary-tsv-outwrites a compact one-row library summary.--candidates-tsv-outwrites the precursor-compatible candidate table and requires--precursor-mz.
The spectral-library-import JSON payload includes:
- the full import report with accepted and rejected entries
- a compact summary over entry count, unique peptides, modified entries, decoy entries, and charge distribution
- compact index facts, including peptide lookup content and precursor-index size
- an optional candidate report over precursor and peptide-filtered entries
- any requested summary or candidate TSV export paths
spectral-library-search ranks one selected query spectrum against one
practical MSP or MGF library.
--query-kindacceptsauto,mgf, ormzml.--library-kindacceptsauto,msp, ormgf.--query-spectrum-idoptionally selects one accepted query spectrum by identifier.--precursor-tolerance-dasets the precursor candidate window before any similarity scoring happens.--tolerance-dasets the fragment-matching tolerance for the similarity stage.--bin-width-daoptionally switches the similarity stage onto coarse m/z binning.--methodacceptscosineordot_product.--modeacceptsraw,normalized,top_n, ortransformed.--top-noptionally limits preprocessing to the most intense peaks.--max-matcheslimits the ranked output table.--tsv-outwrites the ranked match table with target-decoy label, score, explained-intensity fractions, and optional q-value.
The spectral-library-search JSON payload includes:
- the imported library report used for search
- a compact library summary that keeps decoy-entry count explicit
- one search report with precursor policy, similarity policy, candidate count, decoy-candidate count, and ranked matches
- the top-match identifier, peptide, score, and q-value when available
- a search strategy field that stays explicit as either
concatenatedorno_decoy_advisory - any requested ranked-TSV export path
mzml-inspect reports one practical mzML review object without claiming full
vendor-native replacement.
--spectra-jsonl-outwrites accepted spectra as normalized JSONL.--chromatograms-json-outwrites the extracted chromatogram report as JSON.- the command reports decoding support and chromatogram presence explicitly instead of assuming every mzML file is equally supported
The mzml-inspect JSON payload includes:
- run metadata
- the compact accepted-spectrum summary
- binary decoding support with accepted and rejected spectrum counts
- extracted chromatogram traces, including TIC and BPC when present
- reviewer-facing diagnostics about practical scope and missing chromatograms
- any accepted-spectrum JSONL or chromatogram JSON export paths
digest and peptide-index both report:
- the resolved protease name
- the custom protease specification when one was supplied
- the digestion mode
- the missed-cleavage allowance
digest exports one theoretical peptide database under the selected digestion
policy.
--formatacceptstsv,jsonl,parquet, orfasta.--outwrites the main peptide export.--manifest-outwrites the digestion policy manifest.--peptide-protein-table-outoptionally writes a peptide-to-protein TSV sidecar with one row per peptide occurrence.
The main peptide exports preserve:
- source accession and source identifier
- peptide sequence
- peptide length
- start and end coordinates
- missed-cleavage count
- protease and digestion mode
- cleavage type
- neutral mass
The peptide FASTA export writes the peptide sequence body and records source coordinates, missed-cleavage count, peptide length, neutral mass, and protease in the header.
The peptide-to-protein sidecar preserves:
- peptide sequence
- peptide length
- neutral mass
- source accession and source identifier
- source protein family and isoform
- start and end coordinates
- missed-cleavage count
- protease, digestion mode, and cleavage type
fasta-stats reports FASTA-wide review metrics such as duplicate accession
count, duplicate sequence count, target count, decoy count, contaminant count,
and sequence-length summary values.
summarize --kind fasta returns the higher-level FASTA summary, the parser-level
database composition, and the richer FASTA profile so operators can distinguish
structural file quality from biological database makeup and annotation burden.
For PSM evidence, the contaminant-review surface is:
psm-contaminants
psm-contaminants emits a separate contaminant-match report with:
- contaminant PSM count
- pure-contaminant versus mixed-reference PSM counts
- contaminant peptide count
- contaminant protein counts
- row-level entries listing contaminant and target protein references for each contaminant-carrying match
summarize --kind psm now includes the same contaminant report alongside the
standard PSM, peptide, and protein summaries.