Measuring Parse, Recipe, and Evidence Cost¶
Page Maps¶
graph LR
family["Reproducible Research"]
program["Deep Dive Make"]
section["Performance Observability Incident Response"]
page["Measuring Parse, Recipe, and Evidence Cost"]
capstone["Capstone evidence"]
family --> program --> section --> page
page -.applies in.-> capstone
flowchart LR
orient["Orient on the page map"] --> read["Read the main claim and examples"]
read --> inspect["Inspect the related code, proof, or capstone surface"]
inspect --> verify["Run or review the verification path"]
verify --> apply["Apply the idea back to the module and capstone"]
One of the most common mistakes in build-performance work is using the word "slow" as if it meant one thing.
It rarely does.
A build can feel slow because:
- Make spends too much time parsing or expanding
- recipes do expensive real work
- diagnostics and trace volume are operationally heavy
- humans cannot tell which of those is happening, so they make the wrong changes
This page is about separating those costs before tuning begins.
The sentence to keep¶
When someone says "the build is slow," ask:
slow in which layer: parse and evaluation, recipe execution, or evidence and diagnostics?
That question prevents a lot of wasted work.
Not all build time lives in the same place¶
A Make-based system has at least three performance surfaces worth naming:
- parse and evaluation cost
- recipe execution cost
- evidence-surface cost
Each one creates a different kind of problem and needs a different kind of fix.
Treating them as one undifferentiated "performance problem" is how teams end up rewriting the wrong layer.
Parse and evaluation cost¶
This is the time Make spends:
- reading included files
- expanding variables
- running parse-time shell expressions
- constructing its internal view of the graph
Typical warning signs:
make -n allalready feels heavy- the build has many
$(shell ...)calls - include files do expensive work at parse time
- rule generation or repeated expansions dominate before any real tool runs
This cost is architectural. It has more to do with how the build is described than with how expensive the compiler or tests are.
Recipe execution cost¶
This is the time spent in the actual external work:
- compilation
- linking
- testing
- packaging
- code generation
Typical warning signs:
make -n allis cheap, butmake allis expensive- one tool invocation dominates the wall clock
- adding cores or changing tool flags matters more than changing Make structure
This is often where teams incorrectly blame Make for costs that belong to the underlying tools.
Evidence-surface cost¶
This is the cost of making the build understandable:
- trace volume
- dump size
- amount of output humans must sift through during incidents
This cost matters because a build can be semantically correct and still be operationally painful if its evidence is too noisy to use under pressure.
Examples:
--traceoutput is enormous- one incident requires scrolling through thousands of low-value lines
- diagnostic targets dump unstable or redundant information no one can use
This is not purely cosmetic. Observability quality affects how quickly the team can debug the build.
A simple measurement loop¶
Start with a minimal loop:
/usr/bin/time -p make -n all >/dev/null
/usr/bin/time -p make all >/dev/null
make --trace all > build/trace.log
wc -l build/trace.log
This gives you:
- a dry-run timing signal for parse and decision work
- a full-build timing signal that includes real recipe cost
- a rough trace-volume signal
That is not a full profiler. It is enough to stop guessing blindly.
Why make -n is such a useful lens¶
make -n does not execute recipes, but it still performs parse and graph work.
That means:
- if
make -n allis already expensive, your first suspect is not the compiler - if
make -n allis cheap andmake allis expensive, your first suspect is probably not Make itself
This is one of the simplest and most useful distinctions in the module.
It lets you say:
this complaint is about build description overhead
or:
this complaint is about recipe work
Those are very different diagnoses.
Trace volume is a real operational metric¶
Some engineers treat trace volume as secondary because it does not always change wall-clock time much. That misses the point.
A build whose evidence is too large or too noisy can still be expensive in practice because:
- incidents take longer to diagnose
- maintainers avoid using the evidence surfaces
- real signals get buried under routine noise
That is why a simple metric like trace line count can still be useful:
The goal is not "fewest lines wins." The goal is to notice whether the build is producing an evidence surface that another engineer can actually work with.
A small comparison example¶
Imagine two measurements.
Case A¶
This suggests the build description and decision process are consuming most of the cost.
Case B¶
This suggests the performance problem mostly lives in recipes, not in Make's own structure.
This is why measurement separation matters so much. It changes what a rational next move looks like.
Parse cost often comes from habits, not obvious bugs¶
A build may have parse overhead because of design habits like:
- repeated
$(shell find ...) - overuse of
eval - broad, unsorted discovery
- too many layers doing similar work at parse time
These are not dramatic failures. They are accumulations.
That is why Module 09 frames performance work as architecture plus operations, not just micro-optimizations.
Recipe cost often needs tool-level thinking¶
When recipe time dominates, the right next question is often not:
how do we optimize the Makefiles?
It is often:
which tool invocation is doing the expensive work, and is that work justified?
This might point to:
- compilation flags
- test scope
- packaging compression level
- repeated code generation
Make still matters because it orchestrates those steps, but the cost may not belong to its own layer.
Evidence cost should stay proportional to the incident value¶
A build should expose enough evidence to make incidents explainable. It should not emit so much routine noise that the team stops using its own observability surfaces.
That means observability design is part of performance design:
- bounded diagnostic targets
- clear trace usage
- no unstable debug prints inside semantic outputs
This is the part of performance work many teams ignore until an incident forces them to care.
Failure signatures worth recognizing¶
"make -n all is already slow"¶
That usually points to parse, expansion, or discovery cost.
"make all is slow, but dry-run is cheap"¶
That usually points to real tool or recipe cost rather than Make structure.
"We technically have trace output, but nobody can use it under pressure"¶
That means evidence-surface cost is too high.
"We optimized something and saw no measurable difference"¶
That usually means the change targeted the wrong layer.
A review question that improves measurement work¶
Before anyone proposes a build-performance change, ask:
- which layer appears expensive
- what measurement supports that claim
- what command was used
- what a contrasting measurement would have looked like
- whether the proposed change actually targets that layer
If those answers are weak, the tuning proposal is probably weak too.
What to practice from this page¶
Choose one build route and produce a short measurement note:
- dry-run time
- full-build time
- trace line count
- your best guess about which layer dominates
- one next experiment to validate that guess
If you can do that clearly, you have already improved the quality of performance discussion a lot.
End-of-page checkpoint¶
Before leaving this lesson, make sure you can explain:
- why "slow build" is not a sufficient diagnosis
- what parse and evaluation cost means
- what recipe cost means
- why evidence-surface cost is operationally real
- how a small measurement loop can change the next engineering decision