Data System Overview¶
The data system in bijux-pollenomics is designed to keep different kinds of
evidence visible instead of merging everything into one vague export. You
should be able to tell whether you are looking at pollen context,
archaeological context, boundary framing, fieldwork documentation, public
review surfaces, or ancient DNA sample evidence.
The governing stance is pollenomics-first: the repository is built to explain
the broader pollen and environmental evidence system clearly, while keeping
animal ancient-DNA recovery visible as one important but still partial slice of
that larger product.
The Basic Shape¶
flowchart TB
sources["source datasets and papers"]
tracking["tracked source intake"]
normalization["normalized evidence files"]
publication["reports and atlas views"]
sources --> tracking
tracking --> normalization
normalization --> publication
That structure matters because the repository has to support two kinds of use. Sometimes you want the public answer first. Sometimes you want to inspect the governing evidence directly. The system needs to support both without forcing you to decode an internal file tree before understanding what the repository is doing.
What This Overview Should Clarify¶
- which evidence families exist and why they are kept separate
- what kind of question each family can answer
- which surfaces are evidence, which are framing, and which are publication
- why a map, report, or country bundle should never outrank its narrower governing files
Main Data Families¶
| Family | Role in the repository | Main location | Current publication posture |
|---|---|---|---|
| Pollen context | environmental and paleoecological context | data/landclim/, data/neotoma/ |
first-class pollenomics context |
| Archaeology context | broader settlement and environmental archaeology layers | data/sead/, data/raa/ |
contextual support layers |
| Boundary framing | country filtering and regional map framing | data/boundaries/ |
framing layer, not scientific evidence |
| Animal ancient DNA | sample-backed contextual evidence from papers and supplements | data/adna/ |
partial recovery program |
| Fieldwork | direct visit and observation records | docs/public/fieldwork/ |
narrow but explicit record surface |
What Each Family Contributes¶
- pollen context helps explain environmental setting, vegetation history, and broader landscape change
- archaeology context helps explain settlement and material activity around the same geographies
- boundary layers make filtering and regional framing readable, but they are not themselves scientific evidence
- human ancient DNA gives release-based historical population context
- animal ancient DNA provides sample-level domestication and movement clues when source recovery is strong enough
- fieldwork gives a narrow, explicit ground-level record rather than a claim of regional completeness
Main Repository Surfaces¶
data/keeps repository-owned source material, normalized records, and review artifacts.docs/report/keeps the generated country bundles, atlas assets, and public review surfaces.docs/public/pollenomics-data/explains how those tracked files fit together.data/source_family_contracts.jsonanddata/source_family_evidence_stage_matrix.jsonkeep the stage model explicit instead of forcing you to infer it from directory names.data/source_fact_ownership_registry.jsonnames the governing surface for recurring concepts such as project inventory, sample identity, and atlas candidates.
Why The Separation Matters¶
- a source page tells you what entered the repository and why
- an evidence page tells you what claim is currently being governed
- a review surface tells you what is blocked, thin, or refused
- a publication surface tells you what the repository is prepared to show publicly
If those roles blur together, the site becomes easy to browse but hard to trust.
Where To Go Next¶
- Data architecture handbook explains the raw -> normalized -> review -> publication model in one place.
- Pollenomics publication model explains how these families should publish together without pretending they are equally mature.
- Cross-domain evidence matrix keeps domain balance visible in evidence units instead of file counts.
- Output surface classes separates pollenomics context, contextual support, animal recovery, and scaffolding outputs.