Drop-in caching that doesn't break determinism

Neul Labs · May 26, 2026 ·

cachingcorrectnessdeterminism

The first promise rninja makes is “drop-in for ninja.” The second is “faster, because caching.” Those two promises are in tension, because the easy way to make builds fast is to skip work, and the easy way to skip work is to be optimistic about what counts as “the same.” Optimism is how caches break builds.

This post is the design we settled on to keep both promises at once. The short version: hash everything that can affect the output, remember nothing that doesn’t, and fall back to running the action whenever you are not sure.

What “deterministic” has to mean

Before talking about the cache, it is worth nailing down what we are protecting. A build is deterministic if, for a given set of inputs (sources, toolchain, environment, build description), it produces a given set of outputs (artifacts that pass the same tests, byte-for-byte when possible). Most well-run codebases approximate this, and the ones that don’t have known sources of nondeterminism (timestamps embedded in binaries, __DATE__, random IDs in generated code).

A cache changes the equation. With a cache, the executor’s promise becomes: “for a given set of inputs, I will produce outputs that are indistinguishable from what running the action would have produced.” If the cache returns something the action would not have produced — or, worse, returns the right output but stores the wrong dependency information — then the next build will be subtly wrong in a way nobody can debug.

So the design constraint is not “cache aggressively.” It is “cache aggressively, and never return something different from running the action.”

What goes into the hash

The cache key has to cover everything that can change the output. In practice, that is:

The full byte content of every source input. Hashing by mtime is not safe; mtime can lie, and a clean checkout that produces the same source will have a different mtime than the previous checkout did.
The command line that will be executed, including all arguments after variable expansion. Not the template — the rendered command.
The compiler binary itself. Two builds with different versions of gcc must miss, even if the command line is identical. We hash the executable.
The environment variables the action observes. ninja exposes a subset of the environment to commands; that subset has to be part of the key. Variables outside that subset (USER, TERM, PATH after resolution, HOME) are not.
The depfile contents from previous runs, where applicable. If a compile previously depended on a header, the cache key has to include the content of that header — otherwise editing the header doesn’t invalidate the cache.

Every one of these is required. Drop any one and you can construct a case where the cache returns the wrong answer. Add anything more — like wall clock time, hostname, or PID — and you guarantee the cache always misses. The art is in the exact set.

rninja uses blake3 for the hash function. blake3 is fast enough that hashing inputs is not the bottleneck even on a multi-gigabyte source tree; it is also collision-resistant in a way that matters when you are using the hash as a content address.

The depfile problem

The hardest part of this is the depfile loop, because the dependencies of an action are not known until the action runs. C compilers emit them as a side effect of compilation. The cache, however, needs to decide whether to skip the action before running it.

The way rninja handles this is two-pass. The first time an action runs on this codebase, it goes through. The compiler emits its depfile. rninja stores both the artifact and the set of header files that the depfile reports. The next time the same action key shows up, rninja first checks the stored depfile: does the current content of every reported header still match the stored hash? If yes, the cache hit is valid. If any header has changed, the cache misses and the action runs again — producing a new depfile, which the cache stores alongside the new artifact.

This costs an additional hash check per cached action, but it is fast and it is honest. It also makes the cache work correctly across header edits, which is the case people care most about getting right.

What the cache does not remember

The cache stores artifacts and the information needed to validate them. It does not store anything that influences whether to run the action — only how to confirm a cached result is still valid. In particular:

The cache does not store the time the action ran. We don’t pretend the cached version “completed at” any particular time.
The cache does not store stdout or stderr. Cached actions don’t produce a fresh log; the executor emits a cache-hit line in the build output to make this visible.
The cache does not store the user, the host, or the CI job that produced the artifact. None of those affect the artifact’s correctness.

These omissions matter because the cache is shared between machines in the remote-cache configuration. Anything machine-specific in the stored entry would either invalidate it on different machines or, worse, leak machine-specific state across them.

When to invalidate, when to fall back

Even with all of the above, the executor can encounter situations where it cannot be sure the cache is safe. The rule is: on any doubt, fall back to running the action. That sounds defensive, and it is. It is the only way to keep the determinism promise.

Concretely, rninja falls back when:

The stored depfile references a header that no longer exists on disk. We assume the dependency graph has changed and re-run.
The action’s command line uses a variable rninja cannot fully resolve at hash time (rare, but possible with complex generator output). We compute a conservative key and treat a miss as authoritative.
The cached blob fails its content-address check on retrieval. The store may have been corrupted; we re-run.
The user passed -d explain. This flag exists to make ninja chatty about why it is rebuilding; rninja extends it to log cache decisions, and in that mode we re-run anything where the decision was non-trivial so the user can see the action actually happening.

The cost of these fallbacks is that cache hit rate drops in pathological situations. That is the right trade. A cache that is sometimes too pessimistic is a slow build. A cache that is sometimes too optimistic is a wrong build.

Determinism the cache cannot fix

There is one class of nondeterminism the cache cannot help with: when the action itself is non-deterministic given identical inputs. The classic case is a compiler that embeds the current timestamp in the output. Two runs of that compiler over the same source produce two different bytes.

The cache copes with this by losing efficiency, not correctness. The first run produces an artifact with timestamp T1; the cache stores it. The second run, with the same inputs, would produce an artifact with timestamp T2. The cache returns T1, which is byte-for-byte different from what a fresh run would produce. As long as the artifact’s behavior is identical — which it should be, if the only difference is an embedded timestamp — the build is still correct.

When the difference is behavioral (random seeds in generated code, ordered-dict iteration in Python generators), the cache will hide a bug that running the action would have surfaced. The fix is to make the action deterministic upstream. rninja’s docs flag this explicitly. We do not pretend the cache makes non-determinism safe; it just makes deterministic builds faster.

The compatibility suite

The check on all of this is a suite of tests that exercises stock ninja and rninja side-by-side over the same build.ninja files and asserts the produced artifacts match. We run this on real-world build graphs (CMake-generated test projects, in-tree examples) and we run it after every change to the cache. The test that we never let break: every artifact produced by rninja must be replaceable with the artifact produced by stock ninja, and the next build over either result must take the same code path.

That is what “drop-in” means in practice. Not “looks like ninja from the outside” — most caches manage that. “Produces results indistinguishable from ninja” — that is the contract worth keeping.