Skip to content

Encode syntax contexts by deterministic index in metadata#157409

Open
xmakro wants to merge 1 commit into
rust-lang:mainfrom
xmakro:fix/parallel-hygiene-determinism
Open

Encode syntax contexts by deterministic index in metadata#157409
xmakro wants to merge 1 commit into
rust-lang:mainfrom
xmakro:fix/parallel-hygiene-determinism

Conversation

@xmakro
Copy link
Copy Markdown
Contributor

@xmakro xmakro commented Jun 4, 2026

Under the parallel frontend, compiling the same crate twice can produce different .rmeta because of SyntaxContext id allocation. The ids come from a global counter that the parallel frontend hits concurrently, so the raw id values vary between builds. Those raw ids were serialized directly into the metadata, so the output depended on thread scheduling.

This PR encodes each SyntaxContext with an index assigned by order of first appearance during the encoding traversal rather than its raw id. The encoding traversal order is deterministic, so the indices are stable across builds. The root context stays index 0 (and keeps its existing decode-side special case); other contexts are numbered from 1. Both fix-point rounds in HygieneEncodeContext::encode are now processed in index order, so contexts first discovered while encoding another context's data (its parent) also get deterministic indices. The decoding side already treats the serialized id as an opaque per-crate key, so it doesnt need any changes.

I compiled several crates with derive impls and crate macros at -Zthreads=16 -Copt-level=3 -Ccodegen-units=1. Before this PR there were several distinct .rmeta hashes across 10 runs, after there is a single .rmeta hash across all runs.

Addresses #129094.

r? @petrochenkov
cc @Zoxc

Under the parallel front-end, `SyntaxContext` ids are allocated by a
global counter that is hit concurrently, so the same crate can end up
with different raw ids across builds. Those raw ids were written
directly into the crate metadata (and the incremental on-disk cache),
which made the encoded output depend on thread scheduling and broke
reproducible builds. A small crate with a few derives is enough to
produce differing `.rmeta` between otherwise identical compilations.

Encode each `SyntaxContext` using an index assigned by order of first
appearance during the (deterministic) encoding traversal instead of its
raw id. The root context keeps index 0 and its existing special case on
the decoding side, so non-root contexts are numbered from 1. The two
fix-point rounds in `HygieneEncodeContext::encode` are processed in
index order so that contexts discovered while encoding a context's data
(its parent) are themselves numbered deterministically.

The decoding side already treats the on-disk id as an opaque key, so no
decoding changes are required.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants