Optionally store source maps as VLQ-encoded (1/2): Type widening, consumer support by robhogan · Pull Request #1742 · react/metro

robhogan · 2026-06-23T14:54:44Z

Summary:

This stack

Decoded tuple arrays are the single largest contributor to Metro's dev-server heap on large bundles (~10 million retained small arrays on FBiOS entry bundle, for example). Storing the same data as a compact VLQ string instead removes most of that footprint.

This reduces source map memory by ~51% on the heap and ~48% RSS for that ~16K module bundle.

The emitted whole-bundle source map is unchanged. When a module's map is stored as VLQ, fromRawMappings decodes it back to tuples just-in-time, with request-scoped caching. The trade-off is therefore decode + re-encode CPU when a .map is actually requested or /symbolicate request is made.

A plain string is used for mappings for now, since VLQ is ASCII by design. A UInt8Array would be marginally more efficient and potentially transferrable to/from worker threads, but would require more invasive changes to cache (de)serialisation. I did some benchmarking with this and it doesn't justify the complexity right now.

This diff

Adds a VlqMap type ({mappings: string, names: ReadonlyArray<string>}) as an
alternative to the current Array<MetroSourceMapSegmentTuple> for storing
per-module source maps in Module graph nodes (and transform results, and cache artifacts).

Adds the ability to store, thread, decode and (flat-)emit VLQ
maps - nothing actually produces them yet, so these code paths are unused except by tests. The opt-in producer flag lands in the next diff.

Follow up

After this mini-stack, we'll add an opt-in for emitting index source maps, directly re-using per-module VLQ and eliminating the trade-off mentioned above.

Reviewed By: huntie, javache

Differential Revision: D107973884

Summary: Scripts and findings for profiling Metro's memory and CPU during bundling, and an end-to-end benchmark of the compact VLQ source-map work stacked on top. **Methodology:** - Start Metro with `NODE_ARGS="--expose-gc --inspect=9230" DEV=1 js1 run --prefetch=false` - WildeBundle URL: `GET http://localhost:8081/xplat/js/RKJSModules/EntryPoints/WildeBundle.bundle?platform=ios&dev=true&app=com.facebook.Wilde` - RSS profiling via /proc, heap snapshots via Chrome DevTools Protocol - Graph freed via DELETE to the bundle URL (same as fill-http-cache) **Scripts added:** - `fb-metro-cli/memory-investigation/heap-profile.js` — Automated CDP-based profiler: captures 3 heap snapshots (baseline, post-build, post-delete) and compares them - `fb-metro-cli/memory-investigation/heap-compare.js` — Standalone snapshot comparator with streaming parser for multi-GB .heapsnapshot files - `fb-metro-cli/memory-investigation/heap-injector.js` — Optional in-process module exposing /memory, /gc, /snapshot HTTP endpoints - `metro/scripts/profile-memory.sh` — Quick RSS-only profiling via /proc - `fb-metro-cli/memory-investigation/compact-bench-measure.js` — One measurement cycle: builds WildeBundle, then requests WildeBundle.map, recording memory (RSS/heap) + build CPU + .map serialize CPU via CDP - `fb-metro-cli/memory-investigation/run-compact-bench.sh` — Orchestrator: fresh Metro per repeat across three configs (base / compact_flat / compact_indexed), cold or warm cache - `fb-metro-cli/memory-investigation/compact-bench-stats.js` — Welch t-test analysis between any two configs - `fb-metro-cli/memory-investigation/README.md`, `compact-sourcemaps-benchmark-results.md` — Full writeup of methodology and results **Baseline results (WildeBundle, June 2025):** - Startup: 819 MB RSS / 426 MB heap used - Post-build: 2,338 MB RSS / 1,549 MB heap used (+1,122 MB heap) - Post-delete: 507 MB heap used (DELETE frees 93% of build growth) - Arrays dominate: 10M Array objects + backing stores = 858 MB (77% of growth) - Source maps stored as decoded number-tuple arrays are the primary consumer: ~678 MB, 60% of build growth (9,866,476 tuples across 16,562 modules) **Compact source maps — end-to-end benchmark (n=3, WildeBundle):** Three configs: `base` (decoded tuples), `compact_flat` (VLQ storage, flat .map), `compact_indexed` (VLQ storage, indexed passthrough .map). - Memory (both compact configs): heap −51% cold / −53% warm; RSS −48% (1654→810 MB heap cold; all Welch p < 1e-5). - Build CPU: unchanged cold; ~20% faster warm with compact storage. - Serialize CPU (`.map` request): `compact_flat` +18% vs base (decode + re-encode), `compact_indexed` −49% vs base (passthrough). Flat .map is byte-identical to base; indexed .map is +3.4% larger. Bundle output byte-identical across all configs. Full tables in `compact-sourcemaps-benchmark-results.md`. Differential Revision: D107879392

… CPU ~2.5% Summary: Metro's transform worker currently returns source maps from Babel's tranform result via `result.rawMappings.map(toSegmentTuple)`. This *used* to be (as the name suggests) Babel's own source map representation, and was therefore free to access. However, since babel/babel#14497 (`babel/generator` since `v7.17.10`), `rawMappings` is now a getter providing the old structure for backwards compatibility. Accessing `result.rawMappings` forces `babel/generator` to run a second decode (`allMappings`) that allocates a flat array of ~4-5 objects per segment. The better alternative now is to use `result.decodedMap`, which is eagerly computed and free to access. To accommodate the different structure, we introduce `tuplesFromBabelDecodedMap` (decoded source lines are 0-based -> +1, name indices resolved against `decodedMap.names`). Transformer output is byte-identical to `result.rawMappings.map(toSegmentTuple)`, and is simply more efficient. ## Microbenchmark - Real `babel/generator` 7.29.1 over 133 modules / ~30.6K segments, `--expose-gc`, taking median of 11 repeats to discount GC outliers, etc. | Path | CPU (ms/pass) | Transient heap | Notes | |---|---|---|---| | New: `generate()` + `decodedMap` | 19.2 | 13.9 MB | eager, already computed — free | | Old: `generate()` + `rawMappings` | 28.8 | 19.5 MB | triggers `allMappings` decode | | **Saving** | **−9.6 ms (−33%)** | **−5.6 MB (−29%)** | per pass over 30.6K segments | ## E2E benchmark - large bundle, cold build (*AI driven benchmarks and analysis, real numbers*) - Interleaved, paired A/B: each of 12 rounds runs one cold build per cell — {baseline, this diff} x {child-process workers, worker threads}. - Fresh Metro per build, transform cache wiped (cold), `maxWorkers=16` - "Transform CPU" = total user+sys CPU across the whole worker process tree - "tree RSS" = whole-tree resident set (captures workers in both modes) - "graph heap" = main-isolate heapUsed post-build (the retained module graph). - base/this-diff columns are medians; Δ is the paired mean with a 95% CI (Student-t, 11 df) - "n.s." (not significant) = CI includes 0. Child-process workers (Metro default; 12 paired rounds): | metric | baseline | this diff | Δ (95% CI) | |---|---|---|---| | transform CPU (s) | 625 | 612 | **-16.6 (-2.6%) [-24.7, -8.5]** | | build wall (s) | 65.9 | 65.6 | -0.5 (-0.7%) n.s. | | transient tree RSS (GB) | 15.8 | 16.0 | +0.06, n.s. | | post-build tree RSS (GB) | 15.1 | 15.1 | +0.08, n.s. | | graph heap, main isolate (GB) | 1.59 | 1.59 | ~0, n.s. | Worker threads (`unstable_workerThreads`; 12 paired rounds): | metric | baseline | this diff | Δ (95% CI) | |---|---|---|---| | transform CPU (s) | 664 | 653 | -18.6 (-2.8%) [-37.5, +0.3] | | build wall (s) | 59.8 | 59.5 | -1.2 (-1.9%) n.s. | | transient RSS (GB) | 13.2 | 12.7 | -0.46 (-3.5%) [-0.81, -0.11] | | post-build RSS (GB) | 12.3 | 11.9 | -0.45 (-3.7%) [-0.80, -0.10] | | graph heap, main isolate (GB) | 1.60 | 1.60 | ~0, n.s. | Takeaways: - **Transform CPU drops ~2.6-2.8%, equally in both worker modes** — the point estimates (-16.6 s child-process, -18.6 s threads) agree to within 2 s and their CIs overlap almost entirely, so there is no real asymmetry. This is exactly what the mechanism predicts: the optimization runs *inside* the worker (consume `decodedMap` instead of forcing the `rawMappings`/`allMappings` decode), so the saving is identical whether the worker is a child process or a thread. (An earlier small-n pass suggested a child-process-only win; that was sampling noise — threads-mode CPU is just noisier, SD 30 s vs 13 s, which only widens its CI without moving the point estimate.) - Build wall time is ~1-2% lower in both modes but within noise — the CPU saving is spread across 16 workers, so it moves the critical path little. - Main-isolate post-build heap (the retained graph of stored tuples) is unchanged in every config — no memory regression, byte-identical output. Changelog: ``` - **[Performance]**: Use Babel's `decodedMap` for ~2.5% faster transforms Reviewed By: huntie, GijsWeterings Differential Revision: D108506323

…sumer support Summary: ## This stack Decoded tuple arrays are the single largest contributor to Metro's dev-server heap on large bundles (~10 million retained small arrays on FBiOS entry bundle, for example). Storing the same data as a compact VLQ string instead removes most of that footprint. This reduces source map memory by ~51% on the heap and ~48% RSS for that ~16K module bundle. The emitted whole-bundle source map is unchanged. When a module's map is stored as VLQ, `fromRawMappings` decodes it back to tuples just-in-time, with request-scoped caching. The trade-off is therefore decode + re-encode CPU when a `.map` is actually requested or `/symbolicate` request is made. A plain `string` is used for `mappings` for now, since VLQ is ASCII by design. A `UInt8Array` would be marginally more efficient and potentially transferrable to/from worker threads, but would require more invasive changes to cache (de)serialisation. I did some benchmarking with this and it doesn't justify the complexity right now. ## This diff Adds a `VlqMap` type (`{mappings: string, names: ReadonlyArray<string>}`) as an alternative to the current `Array<MetroSourceMapSegmentTuple>` for storing per-module source maps in `Module` graph nodes (and transform results, and cache artifacts). Adds the ability to store, thread, decode and (flat-)emit VLQ maps - **nothing actually produces them yet**, so these code paths are unused except by tests. The opt-in producer flag lands in the next diff. ## Follow up After this mini-stack, we'll add an opt-in for emitting index source maps, directly re-using per-module VLQ and eliminating the trade-off mentioned above. Reviewed By: huntie, javache Differential Revision: D107973884

meta-codesync · 2026-06-23T14:55:07Z

@robhogan has exported this pull request. If you are a Meta employee, you can view the originating Diff in D107973884.

robhogan added 3 commits June 23, 2026 07:54

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 23, 2026

meta-codesync Bot added the meta-exported label Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally store source maps as VLQ-encoded (1/2): Type widening, consumer support#1742

Optionally store source maps as VLQ-encoded (1/2): Type widening, consumer support#1742
robhogan wants to merge 3 commits into
mainfrom
export-D107973884

robhogan commented Jun 23, 2026

Uh oh!

meta-codesync Bot commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

robhogan commented Jun 23, 2026

This stack

This diff

Follow up

Uh oh!

meta-codesync Bot commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant