Optionally store source maps as VLQ encoded (2/2): Transformer output, `unstable_compactSourceMaps` (#1743) by robhogan · Pull Request #1743 · react/metro

robhogan · 2026-06-23T14:54:46Z

Summary:

This stack

Decoded tuple arrays are the single largest contributor to Metro's dev-server heap on large bundles (~10 million retained small arrays on FBiOS entry bundle, for example). Storing the same data as a compact VLQ string instead removes most of that footprint.

This reduces source map memory by ~51% on the heap and ~48% RSS for that ~16K module bundle.

The emitted whole-bundle source map is unchanged. When a module's map is stored as VLQ, fromRawMappings decodes it back to tuples just-in-time, with request-scoped caching. The trade-off is therefore decode + re-encode CPU when a .map is actually requested or /symbolicate request is made.

A plain string is used for mappings for now, since VLQ is ASCII by design. A UInt8Array would be marginally more efficient and potentially transferrable to/from worker threads, but would require more invasive changes to cache (de)serialisation. I did some benchmarking with this and it doesn't justify the complexity right now.

This diff

Adds unstable_compactSourceMaps (default false). When enabled, the transform
worker stores each module's source map as a compact VLQ string (VlqMap)
instead of a decoded Array<MetroSourceMapSegmentTuple>.

Each module's map originates from one of three sources, so we encode the VLQ the
cheapest way available in each case (all byte-identical to the decoded-tuple
output):

transformJS, not minifying (the dominant path — Hermes targets don't minify):
encode the VlqMap straight from result.decodedMap, which babel/generator
computes eagerly while generating, via vlqMapFromBabelDecodedMap — never
materialising tuples.
transformJS, minifying: the minifier returns its own map (not Babel's), so we
re-encode the resulting tuples with vlqMapFromTuples.
transformJSON: builds tuples directly (no Babel generate), so it likewise
re-encodes with vlqMapFromTuples.

countLines is split out of countLinesAndTerminateMap so the decoded-map fast
path can compute the terminating mapping without building and terminating a
tuple array first.

Benchmarks

Cold cache (n=3, means)

| Metric | base | compact |
|---|---|---|---|
| Heap used | 1653.7 MB | 809.7 MB (−51.0%) |
| RSS | 1854.2 MB | 955.2 MB (−48.5%) |
| Heap growth (build) | 1606.5 MB | 761.2 MB (−52.6%) |
| Build CPU (.bundle) | 23.05 s | 22.42 s (n.s.) |
| Serialize CPU (.map) | 11.99 s | 14.19 s (+18.4%) |

Warm cache (n=3, means)

| Metric | base | compact |
|---|---|---|---|
| Heap used | 1552 MB | 731 MB (−52.9%) |
| RSS | 1775 MB | 923 MB (−48.0%) |
| Build CPU (.bundle) | 10.92 s | 8.86 s (−18.9%) |
| Serialize CPU (.map) | 11.87 s | 13.89 s (+17.0%) |

Why behind a flag?

The map structure is exposed to custom serialisers, so changing it is semver-breaking. Landing this as experimental opt-in in a non-breaking release allows integrators to experiment with it.
This is a trade-off of retained memory vs CPU required to emit a flat source map or symbolicate errors. The trade-off largely goes away with indexed maps (coming next) - but that is a semver-breaking change to output.

Changelog:

 - **[Experimental]**: Add `unstable_compactSourceMaps` to use a more memory-efficient source map format.

Differential Revision: D109216060

meta-codesync · 2026-06-23T14:55:11Z

@robhogan has exported this pull request. If you are a Meta employee, you can view the originating Diff in D109216060.

Summary: Scripts and findings for profiling Metro's memory and CPU during bundling, and an end-to-end benchmark of the compact VLQ source-map work stacked on top. **Methodology:** - Start Metro with `NODE_ARGS="--expose-gc --inspect=9230" DEV=1 js1 run --prefetch=false` - WildeBundle URL: `GET http://localhost:8081/xplat/js/RKJSModules/EntryPoints/WildeBundle.bundle?platform=ios&dev=true&app=com.facebook.Wilde` - RSS profiling via /proc, heap snapshots via Chrome DevTools Protocol - Graph freed via DELETE to the bundle URL (same as fill-http-cache) **Scripts added:** - `fb-metro-cli/memory-investigation/heap-profile.js` — Automated CDP-based profiler: captures 3 heap snapshots (baseline, post-build, post-delete) and compares them - `fb-metro-cli/memory-investigation/heap-compare.js` — Standalone snapshot comparator with streaming parser for multi-GB .heapsnapshot files - `fb-metro-cli/memory-investigation/heap-injector.js` — Optional in-process module exposing /memory, /gc, /snapshot HTTP endpoints - `metro/scripts/profile-memory.sh` — Quick RSS-only profiling via /proc - `fb-metro-cli/memory-investigation/compact-bench-measure.js` — One measurement cycle: builds WildeBundle, then requests WildeBundle.map, recording memory (RSS/heap) + build CPU + .map serialize CPU via CDP - `fb-metro-cli/memory-investigation/run-compact-bench.sh` — Orchestrator: fresh Metro per repeat across three configs (base / compact_flat / compact_indexed), cold or warm cache - `fb-metro-cli/memory-investigation/compact-bench-stats.js` — Welch t-test analysis between any two configs - `fb-metro-cli/memory-investigation/README.md`, `compact-sourcemaps-benchmark-results.md` — Full writeup of methodology and results **Baseline results (WildeBundle, June 2025):** - Startup: 819 MB RSS / 426 MB heap used - Post-build: 2,338 MB RSS / 1,549 MB heap used (+1,122 MB heap) - Post-delete: 507 MB heap used (DELETE frees 93% of build growth) - Arrays dominate: 10M Array objects + backing stores = 858 MB (77% of growth) - Source maps stored as decoded number-tuple arrays are the primary consumer: ~678 MB, 60% of build growth (9,866,476 tuples across 16,562 modules) **Compact source maps — end-to-end benchmark (n=3, WildeBundle):** Three configs: `base` (decoded tuples), `compact_flat` (VLQ storage, flat .map), `compact_indexed` (VLQ storage, indexed passthrough .map). - Memory (both compact configs): heap −51% cold / −53% warm; RSS −48% (1654→810 MB heap cold; all Welch p < 1e-5). - Build CPU: unchanged cold; ~20% faster warm with compact storage. - Serialize CPU (`.map` request): `compact_flat` +18% vs base (decode + re-encode), `compact_indexed` −49% vs base (passthrough). Flat .map is byte-identical to base; indexed .map is +3.4% larger. Bundle output byte-identical across all configs. Full tables in `compact-sourcemaps-benchmark-results.md`. Differential Revision: D107879392

Summary: The transform worker built its source-map tuples via `result.rawMappings.map(toSegmentTuple)`. Accessing `result.rawMappings` forces `babel/generator` to run a second decode (`allMappings`) that allocates a flat array of ~4-5 objects per segment — even though Babel *already* computed an equivalent decoded map (`result.decodedMap`, the jridgewell/gen-mapping decoded format) eagerly during generation and Metro was discarding it. This swaps the source to `result.decodedMap` via a new `tuplesFromBabelDecodedMap` (decoded source lines are 0-based -> +1, name indices resolved against `decodedMap.names`). Output is byte-identical to `result.rawMappings.map(toSegmentTuple)`, and it eliminates the redundant `allMappings` decode for *every* build (not just compact source maps). This is a standalone, unconditional improvement, so it sits first in the stack ahead of the compact-source-map work, which builds on it. - `metro-source-map`: add `BabelDecodedMap` type + `tuplesFromBabelDecodedMap`. - `metro-transform-worker`: source tuples from `result.decodedMap`. - `babel_v7.x.x` libdef: add `decodedMap` to `GeneratorResult`. Microbenchmark (real `babel/generator` 7.29.1, 133 modules / ~30.6K segments, `--expose-gc`, median of 11): `generate()` alone 20.2 ms; `generate()` + access `decodedMap` 19.2 ms (~0 delta — it's a sunk, eager cost); `generate()` + access `rawMappings` 28.8 ms (+8.6 ms) with ~40% more heap (19.5 vs 13.9 MB). So consuming `decodedMap` drops the `rawMappings`/`allMappings` decode entirely. (`decodedMap` is eager in 7.29.1; even if a future Babel makes it lazy it allocates arrays-of-numbers vs `rawMappings`' nested objects, so it stays <=.) ## E2E benchmark — cold WildeBundle (this diff vs baseline = parent) Interleaved, paired A/B: each of 12 rounds runs one cold build per cell — {baseline, this diff} x {child-process workers, worker threads} — so slow machine drift is shared within each round and cancels in the per-round delta. Fresh Metro per build, transform cache wiped (cold), `maxWorkers=16`, default path (no compact source maps). "Transform CPU" = total user+sys CPU across the whole worker process tree; "tree RSS" = whole-tree resident set (captures workers in both modes); "graph heap" = main-isolate heapUsed post-build (the retained module graph). base/this-diff columns are medians; Δ is the paired mean with a 95% CI (Student-t, 11 df); "n.s." = CI includes 0. Child-process workers (Metro default; 12 paired rounds): | metric | baseline | this diff | Δ (95% CI) | |---|---|---|---| | transform CPU (s) | 625 | 612 | **-16.6 (-2.6%) [-24.7, -8.5]** | | build wall (s) | 65.9 | 65.6 | -0.5 (-0.7%) n.s. | | transient tree RSS (GB) | 15.8 | 16.0 | +0.06, n.s. | | post-build tree RSS (GB) | 15.1 | 15.1 | +0.08, n.s. | | graph heap, main isolate (GB) | 1.59 | 1.59 | ~0, n.s. | Worker threads (`unstable_workerThreads`; 12 paired rounds): | metric | baseline | this diff | Δ (95% CI) | |---|---|---|---| | transform CPU (s) | 664 | 653 | -18.6 (-2.8%) [-37.5, +0.3] | | build wall (s) | 59.8 | 59.5 | -1.2 (-1.9%) n.s. | | transient RSS (GB) | 13.2 | 12.7 | -0.46 (-3.5%) [-0.81, -0.11] | | post-build RSS (GB) | 12.3 | 11.9 | -0.45 (-3.7%) [-0.80, -0.10] | | graph heap, main isolate (GB) | 1.60 | 1.60 | ~0, n.s. | Takeaways: - **Transform CPU drops ~2.6-2.8%, equally in both worker modes** — the point estimates (-16.6 s child-process, -18.6 s threads) agree to within 2 s and their CIs overlap almost entirely, so there is no real asymmetry. This is exactly what the mechanism predicts: the optimization runs *inside* the worker (consume `decodedMap` instead of forcing the `rawMappings`/`allMappings` decode), so the saving is identical whether the worker is a child process or a thread. (An earlier small-n pass suggested a child-process-only win; that was sampling noise — threads-mode CPU is just noisier, SD 30 s vs 13 s, which only widens its CI without moving the point estimate.) - Build wall time is ~1-2% lower in both modes but within noise — the CPU saving is spread across 16 workers, so it moves the critical path little. - Main-isolate post-build heap (the retained graph of stored tuples) is unchanged in every config — no memory regression, byte-identical output. - Transient/post tree RSS shows a ~0.5 GB (~3.5%) reduction that is resolvable only in the lower-variance threads configuration; the noisier child-process configuration (RSS ~16 GB, CI half-width ~0.3 GB) cannot corroborate it, so treat it as suggestive, not established. Harness: `memory-investigation/run-worker-bench-ab.sh` (interleaved A/B) + `worker-bench-measure.js` + `worker-bench-stats.js` (paired CIs), in the base diff of this stack. Worker-threads mode under `js1 run` is GK-gated (`metro_worker_threads`); benched via a local `FORCE_WORKER_THREADS` override (not committed). Reviewed By: huntie, GijsWeterings Differential Revision: D108506323

…sumer support (#1742) Summary: ## This stack Decoded tuple arrays are the single largest contributor to Metro's dev-server heap on large bundles (~10 million retained small arrays on FBiOS entry bundle, for example). Storing the same data as a compact VLQ string instead removes most of that footprint. This reduces source map memory by ~51% on the heap and ~48% RSS for that ~16K module bundle. The emitted whole-bundle source map is unchanged. When a module's map is stored as VLQ, `fromRawMappings` decodes it back to tuples just-in-time, with request-scoped caching. The trade-off is therefore decode + re-encode CPU when a `.map` is actually requested or `/symbolicate` request is made. A plain `string` is used for `mappings` for now, since VLQ is ASCII by design. A `UInt8Array` would be marginally more efficient and potentially transferrable to/from worker threads, but would require more invasive changes to cache (de)serialisation. I did some benchmarking with this and it doesn't justify the complexity right now. ## This diff Adds a `VlqMap` type (`{mappings: string, names: ReadonlyArray<string>}`) as an alternative to the current `Array<MetroSourceMapSegmentTuple>` for storing per-module source maps in `Module` graph nodes (and transform results, and cache artifacts). Adds the ability to store, thread, decode and (flat-)emit VLQ maps - **nothing actually produces them yet**, so these code paths are unused except by tests. The opt-in producer flag lands in the next diff. ## Follow up After this mini-stack, we'll add an opt-in for emitting index source maps, directly re-using per-module VLQ and eliminating the trade-off mentioned above. Reviewed By: huntie, javache Differential Revision: D107973884

…, `unstable_compactSourceMaps` (#1743) Summary: ## This stack Decoded tuple arrays are the single largest contributor to Metro's dev-server heap on large bundles (~10 million retained small arrays on FBiOS entry bundle, for example). Storing the same data as a compact VLQ string instead removes most of that footprint. This reduces source map memory by ~51% on the heap and ~48% RSS for that ~16K module bundle. The emitted whole-bundle source map is unchanged. When a module's map is stored as VLQ, `fromRawMappings` decodes it back to tuples just-in-time, with request-scoped caching. The trade-off is therefore decode + re-encode CPU when a `.map` is actually requested or `/symbolicate` request is made. A plain `string` is used for `mappings` for now, since VLQ is ASCII by design. A `UInt8Array` would be marginally more efficient and potentially transferrable to/from worker threads, but would require more invasive changes to cache (de)serialisation. I did some benchmarking with this and it doesn't justify the complexity right now. ## This diff Adds `unstable_compactSourceMaps` (default `false`). When enabled, the transform worker stores each module's source map as a compact VLQ string (`VlqMap`) instead of a decoded `Array<MetroSourceMapSegmentTuple>`. Each module's map originates from one of three sources, so we encode the VLQ the cheapest way available in each case (all byte-identical to the decoded-tuple output): - transformJS, not minifying (the dominant path — Hermes targets don't minify): encode the `VlqMap` straight from `result.decodedMap`, which `babel/generator` computes eagerly while generating, via `vlqMapFromBabelDecodedMap` — never materialising tuples. - transformJS, minifying: the minifier returns its own map (not Babel's), so we re-encode the resulting tuples with `vlqMapFromTuples`. - transformJSON: builds tuples directly (no Babel generate), so it likewise re-encodes with `vlqMapFromTuples`. `countLines` is split out of `countLinesAndTerminateMap` so the decoded-map fast path can compute the terminating mapping without building and terminating a tuple array first. ## Benchmarks *Cold cache (n=3, means)* | Metric | base | compact | |---|---|---|---| | **Heap used** | 1653.7 MB | **809.7 MB (−51.0%)** | | **RSS** | 1854.2 MB | 955.2 MB (−48.5%) | | Heap growth (build) | 1606.5 MB | 761.2 MB (−52.6%) | | Build CPU (`.bundle`) | 23.05 s | 22.42 s (n.s.) | | **Serialize CPU (`.map`)** | 11.99 s | **14.19 s (+18.4%)** | *Warm cache (n=3, means)* | Metric | base | compact | |---|---|---|---| | **Heap used** | 1552 MB | **731 MB (−52.9%)** | | **RSS** | 1775 MB | 923 MB (−48.0%) | | Build CPU (`.bundle`) | 10.92 s | 8.86 s (−18.9%) | | **Serialize CPU (`.map`)** | 11.87 s | **13.89 s (+17.0%)** | ## Why behind a flag? 1) The `map` structure is exposed to custom serialisers, so changing it is semver-breaking. Landing this as experimental opt-in in a non-breaking release allows integrators to experiment with it. 2) This is a trade-off of retained memory vs CPU required to emit a flat source map or symbolicate errors. The trade-off largely goes away with indexed maps (coming next) - but that is a semver-breaking change to output. Changelog: ``` - **[Experimental]**: Add `unstable_compactSourceMaps` to use a more memory-efficient source map format. ``` Differential Revision: D109216060

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 23, 2026

meta-codesync Bot added the meta-exported label Jun 23, 2026

robhogan added 4 commits June 24, 2026 09:01

meta-codesync Bot force-pushed the export-D109216060 branch from d51004e to b3f9840 Compare June 24, 2026 16:02

meta-codesync Bot changed the title ~~Optionally store source maps as VLQ encoded (2/2): Transformer output, unstable_compactSourceMaps~~ Optionally store source maps as VLQ encoded (2/2): Transformer output, unstable_compactSourceMaps (#1743) Jun 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally store source maps as VLQ encoded (2/2): Transformer output, `unstable_compactSourceMaps` (#1743)#1743

Optionally store source maps as VLQ encoded (2/2): Transformer output, `unstable_compactSourceMaps` (#1743)#1743
robhogan wants to merge 4 commits into
mainfrom
export-D109216060

robhogan commented Jun 23, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

meta-codesync Bot commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

robhogan commented Jun 23, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This stack

This diff

Benchmarks

Why behind a flag?

Uh oh!

meta-codesync Bot commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

robhogan commented Jun 23, 2026 •

edited by meta-codesync Bot

Loading