Encode syntax contexts by deterministic index in metadata#157409
Open
xmakro wants to merge 1 commit into
Open
Conversation
Under the parallel front-end, `SyntaxContext` ids are allocated by a global counter that is hit concurrently, so the same crate can end up with different raw ids across builds. Those raw ids were written directly into the crate metadata (and the incremental on-disk cache), which made the encoded output depend on thread scheduling and broke reproducible builds. A small crate with a few derives is enough to produce differing `.rmeta` between otherwise identical compilations. Encode each `SyntaxContext` using an index assigned by order of first appearance during the (deterministic) encoding traversal instead of its raw id. The root context keeps index 0 and its existing special case on the decoding side, so non-root contexts are numbered from 1. The two fix-point rounds in `HygieneEncodeContext::encode` are processed in index order so that contexts discovered while encoding a context's data (its parent) are themselves numbered deterministically. The decoding side already treats the on-disk id as an opaque key, so no decoding changes are required.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Under the parallel frontend, compiling the same crate twice can produce different
.rmetabecause ofSyntaxContextid allocation. The ids come from a global counter that the parallel frontend hits concurrently, so the raw id values vary between builds. Those raw ids were serialized directly into the metadata, so the output depended on thread scheduling.This PR encodes each
SyntaxContextwith an index assigned by order of first appearance during the encoding traversal rather than its raw id. The encoding traversal order is deterministic, so the indices are stable across builds. The root context stays index 0 (and keeps its existing decode-side special case); other contexts are numbered from 1. Both fix-point rounds inHygieneEncodeContext::encodeare now processed in index order, so contexts first discovered while encoding another context's data (its parent) also get deterministic indices. The decoding side already treats the serialized id as an opaque per-crate key, so it doesnt need any changes.I compiled several crates with derive impls and crate macros at
-Zthreads=16 -Copt-level=3 -Ccodegen-units=1. Before this PR there were several distinct.rmetahashes across 10 runs, after there is a single.rmetahash across all runs.Addresses #129094.
r? @petrochenkov
cc @Zoxc