Modeling Non-File Inputs and Stamps¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive Make"]
section["Portability Hermeticity Failure Modes"]
page["Modeling Non-File Inputs and Stamps"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
By Module 05, you usually understand file prerequisites well enough to see one of the remaining problems clearly:
the build depends on more than files.
It may depend on:
- the compiler version
- the shell locale
- an environment flag
- a selected tool path
- a feature toggle passed in from CI
Those facts are real. If they change artifact meaning, pretending they do not exist does not make the build simpler. It makes it dishonest.
This page is about learning how to model those facts without poisoning the build with noise.
The sentence to keep¶
Ask this every time you discover an environmental fact:
if this value changed, would the declared artifact meaning change?
If the answer is yes, the build needs to model it somehow.
If the answer is no, it should not be turned into ceremonial metadata just because it is easy to collect.
That distinction matters.
Hermeticity is not the same thing as isolation fantasy¶
People sometimes hear "hermetic" and imagine a build that ignores the outside world completely.
That is rarely the practical goal in normal engineering. The useful goal is narrower:
the build should declare and control the external facts that matter to artifact meaning.
This module calls that "hermetic enough."
That means:
- important environmental inputs are named
- those inputs are either pinned or attested
- the resulting evidence converges
- the build does not inject fresh entropy just to look rigorous
The difference between pinning and attesting¶
There are two healthy ways to handle an external fact:
| Approach | When to use it | Example |
|---|---|---|
| pinning | the value should be fixed for correctness | export LC_ALL := C |
| attesting | the value may vary, but the build must record what it was | compiler version manifest |
Pin when you want the build to behave the same way everywhere.
Attest when you need to acknowledge a real dependency but cannot or should not force one global value.
The mistake is collecting attestation data that changes on every run even when artifact meaning did not.
Common non-file inputs worth thinking about¶
These are the usual suspects:
CC,CFLAGS,CPPFLAGS,LDFLAGS- locale variables such as
LC_ALL - the specific compiler or interpreter path
- feature toggles such as
MODE=debug - selected compression or archive tool identity
- repository state if it is intentionally injected into the artifact
Do not model every one by reflex. First decide whether it truly changes the artifact meaning or only changes operator convenience.
A small compiler-attestation example¶
Suppose a binary should rebuild if the compiler identity changes.
One honest pattern is:
TOOLCHAIN_STAMP := build/toolchain.stamp
$(TOOLCHAIN_STAMP): | build/
@$(CC) --version | sed -n '1p' > $@.tmp
@cmp -s $@.tmp $@ 2>/dev/null || mv $@.tmp $@
@rm -f $@.tmp
app: $(TOOLCHAIN_STAMP) main.o util.o
$(CC) $(CFLAGS) main.o util.o -o $@
This pattern matters because it converges:
- the stamp content changes only when the semantic fact changes
- the file is not rewritten pointlessly every run
- the application target now has an explicit edge to that build fact
That is a much healthier design than hiding the compiler identity entirely.
Why timestamps are often the wrong attestation¶
Beginners often try to "prove" rigor by writing timestamps into manifests or stamps:
That file changes every run, which means it destroys convergence and turns attestation into entropy.
The question is not "can I record something?" The question is "can I record the semantic fact in a stable way?"
A better shape is:
build/env.stamp:
@printf 'CC=%s\nLC_ALL=%s\n' '$(CC)' '$(LC_ALL)' > $@.tmp
@cmp -s $@.tmp $@ 2>/dev/null || mv $@.tmp $@
@rm -f $@.tmp
Now the file changes only when the declared inputs change.
Stamps are not junk files¶
Stamps can feel artificial at first. The healthier view is:
a stamp is a named graph node for a semantic fact that does not already have a natural file output.
If the build meaning depends on "which compiler version produced this artifact," then a stamp or manifest may be the cleanest way to name that dependency.
The stamp is not the workaround. The hidden dependency was the workaround.
What should not be modeled¶
This question matters just as much as what should be modeled.
Examples that often do not belong in artifact-driving stamps:
- the wall-clock time of the build
- the exact username of the operator
- the terminal width
- a random temporary directory that does not affect outputs
These facts may be interesting for logs or telemetry. That does not mean they belong in rebuild logic.
If you model too much, the build becomes noisy and unstable.
Attestation must not contaminate the artifact¶
One subtle failure mode is mixing evidence into the artifact itself in a way that ruins equivalence.
For example:
- embedding build time into a binary that should otherwise be reproducible
- writing a host-specific path into a generated header used by normal compilation
- appending a diagnostic signature directly into a shared package payload
Sometimes that is required by the product. Often it is just a convenience that destroys reproducibility.
A better pattern is to keep attestation adjacent to the artifact:
- a sidecar manifest
- a separate verification bundle
- a stamped contract file
That way the build can both prove its environment and preserve artifact equivalence.
A small feature-flag example¶
Suppose CI passes MODE=debug and local builds default to release.
That is a real non-file input. You can model it with a small mode manifest:
MODE_MANIFEST := build/mode.manifest
$(MODE_MANIFEST): | build/
@printf 'MODE=%s\n' '$(MODE)' > $@.tmp
@cmp -s $@.tmp $@ 2>/dev/null || mv $@.tmp $@
@rm -f $@.tmp
app: $(MODE_MANIFEST) main.o
$(CC) $(CFLAGS) main.o -o $@
This has two teaching benefits:
- the mode becomes a visible build fact
- the manifest gives you a stable place to inspect what the build believed
Failure signatures worth recognizing¶
"The build differs across machines, but the file graph looks identical"¶
That usually means a non-file input is real but unmodeled.
"The selftest never converges after we added provenance files"¶
That often means the provenance files record timestamps or other fresh entropy rather than stable semantic facts.
"We know the compiler matters, but we do not know which compiler produced this artifact"¶
That means attestation is missing or too informal.
"Every rebuild touches the same stamp even when nothing changed"¶
That is a stamp design bug. The graph fact is being rewritten instead of compared and published only on change.
A good design question¶
When you propose a stamp or manifest, ask three things:
- which semantic fact does it represent
- which targets depend on that fact
- can the file converge when the fact does not change
If you cannot answer all three, the stamp is not ready yet.
What to practice from this page¶
Take one non-file input in the capstone or your own build and classify it:
- does it change artifact meaning
- should it be pinned or attested
- what file should represent that fact
- which targets should depend on it
- how will you keep the representation convergent
If you can answer those cleanly, you are modeling the environment instead of merely observing it.
End-of-page checkpoint¶
Before leaving this lesson, make sure you can explain:
- why hermeticity is about declared external facts, not isolation fantasy
- when to pin an input and when to attest it
- why timestamps are usually poor build facts
- why stamps and manifests can improve graph truth
- why provenance should often live beside the artifact rather than inside it