File API (publish/v1)¶
Guide Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive Snakemake"]
guide["Capstone docs"]
section["FILE_API"]
page["File API (publish/v1)"]
proof["Proof route"]
family --> program --> guide --> section --> page
page -.checks against.-> proof
flowchart LR
orient["Read the guide boundary"] --> inspect["Inspect the named files, targets, or artifacts"]
inspect --> run["Run the confirm, demo, selftest, or proof command"]
run --> compare["Compare output with the stated contract"]
compare --> review["Return to the course claim with evidence"]
This capstone exports a small, versioned set of artifacts under publish/v1/.
All JSON files:
- are UTF-8
- end with a newline
- are deterministic (sorted keys where applicable)
- include
schema_versionfor forward-compatible evolution
The per-sample intermediate files live under results/{sample}/ and are
documented here as well because they form the workflow contracts.
Use PUBLISH_REVIEW_GUIDE.md when the question is not just what the files are, but why
they deserve downstream trust.
Results per sample (results/{sample}/)¶
qc_raw.json / qc_trimmed.json¶
Produced by: python -m capstone.qc_fastq
Minimal fields:
schema_version(int)path(str)reads(int)bases(int)mean_read_length(float)gc_fraction(float)n_fraction(float)qual_mean_per_pos(list[float])length_hist(dict[int,int])
trim.json¶
Produced by: python -m capstone.trim_fastq
schema_versioninput:{path, reads, bases}output:{path, reads, bases}params: trimming parameters used for the runclipping: nested counts for adapter, quality, and poly-run clipping
dedup.json¶
Produced by: python -m capstone.dedup_fastq
schema_versionmode(deduporcopy)input:{path, reads, bases}output:{path, reads, bases}dedup_keyduplicates_dropped
kmer.json¶
Produced by: python -m capstone.kmer_profile
schema_versioninputksignature_sizeunique_kmerstotal_kmerssignature(list[int])top_kmers(list[{kmer,count}])
screen.json¶
Produced by: python -m capstone.screen_panel
schema_versionsample_kmer_jsonpanel_fastaksignature_sizescore_typescores(sorted list of panel hits)
Published API (publish/v1/)¶
discovered_samples.json¶
Produced by the checkpoint discover_samples.
schema_versionallow_paired_endraw_dir,glob,n_filessamplesmapping:{sample: {mode: SE|PE, reads: {SE|R1|R2: path}}}
summary.json and summary.tsv¶
Produced by: python -m capstone.summarize
schema_versionunits:{sample: {...}}(merged metrics from all upstream steps)- each unit includes
raw_qc,trim,trimmed_qc,dedup,kmer,screen, andhighlights
report/index.html¶
Produced by: python -m capstone.report
Static HTML report (no external JS/CSS).
provenance.json¶
Produced by rule provenance.
Records:
timestamp_utcpython_version,python_executable,platformsnakemake_versiongit_commit(if available)- the fully materialized config
manifest.json¶
Produced by: python -m capstone.manifest
schema_versionbase: absolute directory used to relativize published pathsfiles: ordered list of{path, sha256}records
Use RESULTS_BOUNDARY_GUIDE.md when the question is whether a surface belongs under
results/ as an internal contract or under publish/v1/ as a downstream-facing one.