Worked Example: Hardening an Inherited Build¶

Page Maps¶

graph LR
  family["Reproducible Research"]
  program["Deep Dive Make"]
  section["Portability Hermeticity Failure Modes"]
  page["Worked Example: Hardening an Inherited Build"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

The five core lessons in Module 05 are easiest to trust when they all appear in one build that is recognizably imperfect.

This example starts with a build that "mostly works":

the lead developer can run it locally
CI sometimes fails in ways nobody can reproduce quickly
parallel output is confusing
provenance files exist, but they break convergence
the team suspects Make itself is the main problem

That is exactly the kind of build many teams inherit.

The incident report¶

Assume you inherit a repository with these complaints:

CI fails on a machine with an older Make
local make -j8 seems fine, but a recursive sub-build behaves almost serially
the build records environment information, but make -q all still returns 1 after a successful run
the team says the build is "slow," but nobody can say whether Make or the compiler is responsible
someone proposes replacing Make entirely with a workflow tool

That is enough to begin.

The starting Makefile shape¶

Here is the simplified build:

SHELL := /bin/sh

MODE ?= release

include mk/generated-env.mk

.PHONY: all thirdparty

all: thirdparty app package

thirdparty:
    make -C vendor/lib all

api.h api.json: gen_api.py schema.yml
    python3 gen_api.py

app: api.h main.o
    $(CC) $(CFLAGS) main.o -o $@

package: app
    tar -czf app.tar.gz app

mk/generated-env.mk:
    @printf 'BUILD_TIME := %s\n' "$$(date +%s)" > $@

Every line here is plausible. That is what makes the example useful.

Step 1: stop speaking vaguely¶

The first repair is not in the Makefile. It is in the investigation style.

Run:

make all
make -q all; echo $?
/usr/bin/time -p make -n all >/dev/null
/usr/bin/time -p make all >/dev/null

This gives you four immediate facts:

whether the build converges
whether decision cost is large
whether full-build cost is much larger than decision cost
whether the current complaint is reproducible in a small evidence loop

That already improves the quality of the conversation.

Step 2: establish the portability contract¶

CI fails on an older Make. Do not patch the symptom first. Ask what the build requires.

Suppose gen_api.py really needs grouped targets in the final design. Then the build needs a contract near the top:

ifeq ($(origin MAKE_VERSION),undefined)
$(error GNU Make is required)
endif

MIN_GNU_MAKE_OK := $(if $(filter 4.3% 4.4% 4.5% 5.%,$(MAKE_VERSION)),yes,)
ifeq ($(MIN_GNU_MAKE_OK),)
$(error GNU Make 4.3 or newer required; found $(MAKE_VERSION))
endif

PYTHON ?= python3

Now the CI failure becomes a declared boundary rather than a random parser error.

This is the Core 1 lesson in action:

required tools and versions are stated explicitly
unsupported environments fail early
the build stops pretending that all Make implementations are equivalent

Step 3: repair the recursive boundary¶

The inherited build uses:

thirdparty:
    make -C vendor/lib all

That is the classic jobserver leak.

Repair it to:

.PHONY: thirdparty

thirdparty:
    +$(MAKE) -C vendor/lib all

Then decide whether recursion depth should be bounded:

ifeq ($(MAKELEVEL),2)
$(error recursive build depth exceeded)
endif

This does not magically make recursion good. It makes the boundary honest and inspectable.

This is Core 2:

$(MAKE) preserves recursive semantics
+ keeps dry-run honest
MAKELEVEL stops accidental depth growth

Step 4: make environment evidence converge¶

The inherited file:

mk/generated-env.mk:
    @printf 'BUILD_TIME := %s\n' "$$(date +%s)" > $@

is trying to attest environment state. In reality, it injects fresh entropy every run and prevents convergence.

The healthier question is:

which environmental facts actually change artifact meaning?

Suppose the important ones are MODE, CC, and LC_ALL. Then write a convergent manifest:

ENV_MANIFEST := build/env.manifest

$(ENV_MANIFEST): | build/
    @printf 'MODE=%s\nCC=%s\nLC_ALL=%s\n' '$(MODE)' '$(CC)' '$(LC_ALL)' > $@.tmp
    @cmp -s $@.tmp $@ 2>/dev/null || mv $@.tmp $@
    @rm -f $@.tmp

app: $(ENV_MANIFEST) api.h main.o
    $(CC) $(CFLAGS) main.o -o $@

Now the build records real facts without rewriting them unnecessarily.

This is Core 3:

environment facts become explicit
manifests converge
attestation sits beside the artifact instead of poisoning it

Step 5: identify whether Make is actually the performance problem¶

The team says the build is slow.

Compare:

/usr/bin/time -p make -n all >/dev/null
/usr/bin/time -p make all >/dev/null
make --trace all > build/trace.log
wc -l build/trace.log

Imagine the results are:

make -n all   -> 0.3s
make all      -> 18.7s
trace lines   -> 210

That tells a clearer story:

Make parse/decision overhead is modest
recipe execution dominates
trace volume is manageable

So the next conversation should be about compiler or packaging cost, not a speculative Make rewrite.

This is Core 4:

separate performance layers before proposing fixes
use measurement to reject the wrong explanation

Step 6: fix a real graph problem instead of blaming the tool¶

The build also has:

api.h api.json: gen_api.py schema.yml
    python3 gen_api.py

If this duplicates under -j, that is not a reason to migrate away from Make. It is a reason to model the generation honestly.

For example:

API_STAMP := build/api.stamp

$(API_STAMP): gen_api.py schema.yml | build/
    python3 gen_api.py
    touch $@

api.h api.json: $(API_STAMP)

This is still Make doing exactly what it should do: represent a generation event in a file or stamp graph.

Step 7: decide the real tool boundary¶

The remaining awkward part may be packaging:

package: app
    tar -czf app.tar.gz app

This is still fine inside Make because the concern is file-oriented and the output contract is simple.

But imagine the team now wants:

dependency resolution
multi-environment release promotion
long-running approval workflows
retry policies and remote execution state

That is the moment to ask whether Make should remain the orchestrator while another tool owns those richer workflow concerns.

This is Core 5:

classify the concern first
keep Make where it models artifacts well
move stateful workflow concerns only when the boundary is clear

The repaired sketch¶

The hardening pass leaves the build closer to this:

SHELL := /bin/sh
.SHELLFLAGS := -eu -c

ifeq ($(origin MAKE_VERSION),undefined)
$(error GNU Make is required)
endif

MIN_GNU_MAKE_OK := $(if $(filter 4.3% 4.4% 4.5% 5.%,$(MAKE_VERSION)),yes,)
ifeq ($(MIN_GNU_MAKE_OK),)
$(error GNU Make 4.3 or newer required; found $(MAKE_VERSION))
endif

MODE ?= release
PYTHON ?= python3
ENV_MANIFEST := build/env.manifest
API_STAMP := build/api.stamp

.PHONY: all thirdparty

all: thirdparty app package

thirdparty:
    +$(MAKE) -C vendor/lib all

$(ENV_MANIFEST): | build/
    @printf 'MODE=%s\nCC=%s\nLC_ALL=%s\n' '$(MODE)' '$(CC)' '$(LC_ALL)' > $@.tmp
    @cmp -s $@.tmp $@ 2>/dev/null || mv $@.tmp $@
    @rm -f $@.tmp

$(API_STAMP): gen_api.py schema.yml | build/
    python3 gen_api.py
    touch $@

api.h api.json: $(API_STAMP)

app: $(ENV_MANIFEST) api.h main.o
    $(CC) $(CFLAGS) main.o -o $@

package: app
    tar -czf app.tar.gz app

This version is not perfect because no real build ever is. But it is far more honest.

What each core contributed¶

flowchart TD
  symptoms["Inherited build symptoms"] --> contract["Core 1: portability contract"]
  contract --> recursion["Core 2: recursive boundary under jobserver"]
  recursion --> inputs["Core 3: convergent environment evidence"]
  inputs --> performance["Core 4: measurement before optimization"]
  performance --> boundary["Core 5: classify tool boundaries honestly"]
  boundary --> result["Declared and testable build contract"]

This is why the module is organized the way it is. Hardening is not one trick. It is a series of clarifications.

What you should say at the end¶

A strong summary sounds like this:

The build was not mainly broken because "Make is flaky." It had an undeclared version contract, a recursive boundary that leaked jobserver semantics, a non-convergent environment record, and an unmeasured performance complaint. We tightened the contract, modeled the environment honestly, repaired the recursive call, and used measurements to separate Make overhead from recipe cost. Only after that did we ask whether any concern belonged in another tool.

That is the kind of explanation Module 05 is trying to build.

What to practice after this example¶

Take one real inherited build and rewrite its story in the same order:

state the symptoms precisely
establish the current contract or the lack of one
inspect recursive boundaries
identify one non-file input worth modeling
produce one measurement before proposing a performance fix
decide whether any concern has crossed a tool boundary

If you can do that, Module 05 has started to change how you harden builds.