Measuring Parse, Recipe, and Evidence Cost¶

Page Maps¶

graph LR
  family["Reproducible Research"]
  program["Deep Dive Make"]
  section["Performance Observability Incident Response"]
  page["Measuring Parse, Recipe, and Evidence Cost"]
  capstone["Capstone evidence"]

  family --> program --> section --> page
  page -.applies in.-> capstone

flowchart LR
  orient["Orient on the page map"] --> read["Read the main claim and examples"]
  read --> inspect["Inspect the related code, proof, or capstone surface"]
  inspect --> verify["Run or review the verification path"]
  verify --> apply["Apply the idea back to the module and capstone"]

One of the most common mistakes in build-performance work is using the word "slow" as if it meant one thing.

It rarely does.

A build can feel slow because:

Make spends too much time parsing or expanding
recipes do expensive real work
diagnostics and trace volume are operationally heavy
humans cannot tell which of those is happening, so they make the wrong changes

This page is about separating those costs before tuning begins.

The sentence to keep¶

When someone says "the build is slow," ask:

slow in which layer: parse and evaluation, recipe execution, or evidence and diagnostics?

That question prevents a lot of wasted work.

Not all build time lives in the same place¶

A Make-based system has at least three performance surfaces worth naming:

parse and evaluation cost
recipe execution cost
evidence-surface cost

Each one creates a different kind of problem and needs a different kind of fix.

Treating them as one undifferentiated "performance problem" is how teams end up rewriting the wrong layer.

Parse and evaluation cost¶

This is the time Make spends:

reading included files
expanding variables
running parse-time shell expressions
constructing its internal view of the graph

Typical warning signs:

make -n all already feels heavy
the build has many $(shell ...) calls
include files do expensive work at parse time
rule generation or repeated expansions dominate before any real tool runs

This cost is architectural. It has more to do with how the build is described than with how expensive the compiler or tests are.

Recipe execution cost¶

This is the time spent in the actual external work:

compilation
linking
testing
packaging
code generation

Typical warning signs:

make -n all is cheap, but make all is expensive
one tool invocation dominates the wall clock
adding cores or changing tool flags matters more than changing Make structure

This is often where teams incorrectly blame Make for costs that belong to the underlying tools.

Evidence-surface cost¶

This is the cost of making the build understandable:

trace volume
dump size
amount of output humans must sift through during incidents

This cost matters because a build can be semantically correct and still be operationally painful if its evidence is too noisy to use under pressure.

Examples:

--trace output is enormous
one incident requires scrolling through thousands of low-value lines
diagnostic targets dump unstable or redundant information no one can use

This is not purely cosmetic. Observability quality affects how quickly the team can debug the build.

A simple measurement loop¶

Start with a minimal loop:

/usr/bin/time -p make -n all >/dev/null
/usr/bin/time -p make all >/dev/null
make --trace all > build/trace.log
wc -l build/trace.log

This gives you:

a dry-run timing signal for parse and decision work
a full-build timing signal that includes real recipe cost
a rough trace-volume signal

That is not a full profiler. It is enough to stop guessing blindly.

Why `make -n` is such a useful lens¶

make -n does not execute recipes, but it still performs parse and graph work.

That means:

if make -n all is already expensive, your first suspect is not the compiler
if make -n all is cheap and make all is expensive, your first suspect is probably not Make itself

This is one of the simplest and most useful distinctions in the module.

It lets you say:

this complaint is about build description overhead

or:

this complaint is about recipe work

Those are very different diagnoses.

Trace volume is a real operational metric¶

Some engineers treat trace volume as secondary because it does not always change wall-clock time much. That misses the point.

A build whose evidence is too large or too noisy can still be expensive in practice because:

incidents take longer to diagnose
maintainers avoid using the evidence surfaces
real signals get buried under routine noise

That is why a simple metric like trace line count can still be useful:

make --trace all > build/trace.log
wc -l build/trace.log

The goal is not "fewest lines wins." The goal is to notice whether the build is producing an evidence surface that another engineer can actually work with.

A small comparison example¶

Imagine two measurements.

Case A¶

make -n all   -> 2.9s
make all      -> 3.2s
trace lines   -> 240

This suggests the build description and decision process are consuming most of the cost.

Case B¶

make -n all   -> 0.2s
make all      -> 18.0s
trace lines   -> 180

This suggests the performance problem mostly lives in recipes, not in Make's own structure.

This is why measurement separation matters so much. It changes what a rational next move looks like.

Parse cost often comes from habits, not obvious bugs¶

A build may have parse overhead because of design habits like:

repeated $(shell find ...)
overuse of eval
broad, unsorted discovery
too many layers doing similar work at parse time

These are not dramatic failures. They are accumulations.

That is why Module 09 frames performance work as architecture plus operations, not just micro-optimizations.

Recipe cost often needs tool-level thinking¶

When recipe time dominates, the right next question is often not:

how do we optimize the Makefiles?

It is often:

which tool invocation is doing the expensive work, and is that work justified?

This might point to:

compilation flags
test scope
packaging compression level
repeated code generation

Make still matters because it orchestrates those steps, but the cost may not belong to its own layer.

Evidence cost should stay proportional to the incident value¶

A build should expose enough evidence to make incidents explainable. It should not emit so much routine noise that the team stops using its own observability surfaces.

That means observability design is part of performance design:

bounded diagnostic targets
clear trace usage
no unstable debug prints inside semantic outputs

This is the part of performance work many teams ignore until an incident forces them to care.

Failure signatures worth recognizing¶

"`make -n all` is already slow"¶

That usually points to parse, expansion, or discovery cost.

"`make all` is slow, but dry-run is cheap"¶

That usually points to real tool or recipe cost rather than Make structure.

"We technically have trace output, but nobody can use it under pressure"¶

That means evidence-surface cost is too high.

"We optimized something and saw no measurable difference"¶

That usually means the change targeted the wrong layer.

A review question that improves measurement work¶

Before anyone proposes a build-performance change, ask:

which layer appears expensive
what measurement supports that claim
what command was used
what a contrasting measurement would have looked like
whether the proposed change actually targets that layer

If those answers are weak, the tuning proposal is probably weak too.

What to practice from this page¶

Choose one build route and produce a short measurement note:

dry-run time
full-build time
trace line count
your best guess about which layer dominates
one next experiment to validate that guess

If you can do that clearly, you have already improved the quality of performance discussion a lot.

End-of-page checkpoint¶

Before leaving this lesson, make sure you can explain:

why "slow build" is not a sufficient diagnosis
what parse and evaluation cost means
what recipe cost means
why evidence-surface cost is operationally real
how a small measurement loop can change the next engineering decision