Skip to content

BEAM-native JS engine and compiler#5

Open
dannote wants to merge 628 commits intomasterfrom
beam-vm-interpreter
Open

BEAM-native JS engine and compiler#5
dannote wants to merge 628 commits intomasterfrom
beam-vm-interpreter

Conversation

@dannote
Copy link
Copy Markdown
Member

@dannote dannote commented Apr 15, 2026

Adds a second QuickJS execution backend on the BEAM.

What’s in here

  • QuickJS bytecode decoder in Elixir
  • interpreter for QuickJS bytecode on the BEAM
  • hybrid compiler from QuickJS bytecode to BEAM modules
  • raw BEAM disassembly for the :beam backend via QuickBEAM.disasm/2
  • mode: :beam support in the public API
  • require(), module loading, dynamic import, globals, handlers, and interop for the VM path
  • stack traces, source positions, and Error.captureStackTrace

Runtime coverage

  • Object, Array, Function, String, Number, Boolean
  • Math, JSON, Date, RegExp
  • Map, Set, WeakMap, WeakSet, Symbol
  • Promise, async/await, generators, async generators
  • Proxy, Reflect
  • TypedArray, ArrayBuffer, BigInt
  • classes, inheritance, super, private fields, private methods, private accessors, static private members, brand checks

Validation

  • QUICKBEAM_BUILD=1 MIX_ENV=test mix test
  • MIX_ENV=test QUICKBEAM_BUILD=1 mix test test/vm/js_engine_test.exs --include js_engine --seed 0
  • mix compile --warnings-as-errors
  • mix format --check-formatted
  • mix credo --strict
  • mix dialyzer
  • mix ex_dna
  • zlint lib/quickbeam/*.zig lib/quickbeam/napi/*.zig
  • bunx oxlint -c oxlint.json --type-aware --type-check priv/ts/
  • bunx jscpd lib/quickbeam/*.zig priv/ts/*.ts --min-tokens 50 --threshold 0

Current local result:

  • 2363 tests, 0 failures, 1 skipped, 54 excluded

@dannote dannote force-pushed the beam-vm-interpreter branch from 0eb3475 to 7c1c574 Compare April 15, 2026 14:06
@dannote dannote changed the title BEAM-native JS interpreter (Phase 0-1) BEAM-native JS interpreter Apr 16, 2026
@dannote dannote marked this pull request as ready for review April 16, 2026 08:41
@dannote dannote force-pushed the beam-vm-interpreter branch 2 times, most recently from 75fdba5 to 527d5b9 Compare April 20, 2026 08:45
@dannote dannote changed the title BEAM-native JS interpreter BEAM-native JS engine and compiler Apr 21, 2026
dannote added 24 commits April 22, 2026 22:57
Change shape representation from {:shape, shape_id, vals, proto} to
{:shape, shape_id, offsets, vals, proto}. The offsets map is inlined
from the shape table, eliminating the Process.get + Map.fetch! chain
on every property read.

Get.get shape hit: 775ns → 648ns (16% faster).
~380us saved per render at 3000 property reads.
…s.lookup

fprof showed Shapes.get_shape (54K calls, 108ms) as the #1 bottleneck.
It was still called from Put.put via Shapes.lookup on the hot path.

Replace Shapes.lookup(shape_id, key) with Map.fetch(offsets, key)
in shape_put, Store.put_obj_key, and put/3 for length.

Get.get: 648ns → 406ns (37% faster)
Preact render: 6.95ms → 6.55ms (5.8% faster)
…calls

transition/2 now returns {shape_id, offsets, offset} instead of
{shape_id, offset}. Callers no longer need a separate get_shape
call to fetch the new shape's offsets after transition.

Eliminates ~13K redundant shape table lookups per render.
Transition cache now stores {child_id, child_offsets} instead of
just child_id. Eliminates get_shape(child_id) on every cache hit.

get_shape calls per render: 27,930 → 14,819 (verified via fprof).
Preact render: 6.55ms → 6.2ms.
The grow path (adding a property to a shape-backed object) was doing:
Tuple.to_list → ++ List.duplicate → ++ [val] → List.to_tuple (1693ns)

Now uses :erlang.append_element for the common case where offset ==
tuple_size (sequential property addition): 217ns — 8× faster.

Preact render: 6.1ms → 5.4ms (12% faster).
Heap.frozen? now checks a global :qb_has_frozen flag before doing
per-object Process.get. In Preact SSR (where nothing is frozen),
this eliminates 13,552 process dictionary lookups per render.

Preact render: 5.37ms → 5.25ms.
Detect 'object; (push_val; define_field)* ...' patterns during lowering
and batch them into a single Heap.wrap(%{k1 => v1, k2 => v2, ...}).

Eliminates ~10K individual Put.put calls per Preact render (881 VNodes
× ~12 batched fields each). Each Put.put was doing: Process.get + shape
transition + put_val + Process.put.

Supported value opcodes: integer literals, null, undefined, booleans,
get_arg, get_loc, push_atom_value, empty string.
Falls back to individual define_field for values that can't be lowered
at compile time (function calls, computed values).

Preact render: 5.25ms → 4.4ms (16% faster).
Preact's h() function has 5 args, called 881 times per render.
The generic Enum.take path was 14.5ms (8,504 calls). Direct
pattern matching eliminates the list traversal overhead.
Replaces Enum.with_index + Enum.reduce with direct :maps.from_list
for shape-to-map reconstruction.
Replace make_ref() + {:qb_obj, ref} tuple keys with monotonic integer
counter + raw integer keys in the process dictionary.

From the OTP JIT source (erl_process_dict.c), the hash function for
small integers is just 'unsigned_val(Term)' — essentially free. Tuple
keys go through the full erts_internal_hash which is much more
expensive. EQ comparison is also a single pointer compare for integers
vs deep tuple comparison for {:qb_obj, ref}.

Measured: Process.put with raw integer keys is 2× faster than tuple
keys. Process.get is 1.35× faster.

Preact render: 4.3ms → 3.55ms (17% faster).
Total session: 6.95ms → 3.55ms (49% faster).
- Replace List.zip with manual keys_vals_to_map recursion (1.6× faster)
- Replace PD counter with :erlang.unique_integer (2.6× faster)

Preact render: 3.55ms → 3.4ms.
Two changes:
1. inline_get_var_ref emits get_capture(ctx, {type, var_idx}) instead
   of get_var_ref(ctx, integer_idx). The capture key is resolved at
   compile time from closure_vars, eliminating Enum.at list traversal.

2. current_var_ref caches closure_vars as a tuple of capture keys per
   function (keyed on byte_code binary). elem(tuple, idx) is O(1) vs
   Enum.at(list, idx) which is O(n).

Before: 1,609 Enum.at calls at 20.6ms total (12.8us each).
After: 0 Enum.at calls. get_capture is 3.0us per call.
Shape IDs are contiguous integers 0..N. Storing shapes in a tuple
and using elem(table, id) eliminates Map.fetch! overhead entirely.
put_shape appends via :erlang.append_element for new shapes and
uses put_elem for updates.

Also eliminates the separate :qb_shape_next_id counter — the next
ID is simply tuple_size(table).

5,565 get_shape calls/render × ~80ns savings = ~157us per render.
Preact render: 3.50ms → 3.34ms.
The compiler now emits Heap.wrap_keyed(keys_tuple, vals_tuple) instead
of Heap.wrap(%{k1 => v1, ...}) for batched object literals.

wrap_keyed uses the keys tuple (a compile-time constant) as a cache key
to look up the pre-resolved shape. On cache hit, it skips:
- :erlang.phash2 of the key set
- Shapes.from_map shape resolution
- Map construction from keys/values

This eliminates ~103ns per object creation at 2,399 objects/render.

Preact render: 3.43ms → 3.25ms (5.2% faster).
Instead of emitting RuntimeHelpers.get_var(ctx, name) which traverses:
  get_var → fetch_ctx_var → context_globals → GlobalEnv.fetch → Map.fetch

Emit :erlang.map_get(:globals, ctx) at compile time to extract the
globals map, then call get_global(globals, name) which does a single
Map.fetch.

Eliminates 3 function calls per global variable access (2,644 calls
per Preact render).

Preact render: 3.25ms → 3.18ms.
The op_eq helper now has:
1. {Same, Same} → true (identity check, covers 79% of comparisons)
2. Number guards → == (existing)
3. Binary guards → == (string comparison without Values.eq dispatch)
4. Fallback → Values.eq (handles null/undefined cross-equality)

Values.eq calls: 5,175 → 1,075 per render.
Skip resetting home_object and super fields in the fast path for
closures with need_home_object: false. These closures by definition
don't use home objects, so the fields can inherit from the parent
context safely.

Saves 2 map update operations per function call (2,010 calls/render).
put_field skips normalize_key (key is known binary at compile time),
frozen? check (not needed for just-created objects), __proto__ check
(key is known not to be __proto__), and Heap.get_obj_raw delegation
(calls Process.get directly).

The compiler emits put_field for define_field opcodes where the key
is a resolved string literal.

Eliminates ~3 function calls and 2 guards per property write for
4,298 define_field ops per Preact render.
Generate op_get_field/2 as a local function in each compiled BEAM
module. The fast path does:
  1. Pattern match {:obj, Id}
  2. erlang:get(Id) — direct BIF call, no delegation chain
  3. Pattern match {:shape, _, Offsets, Vals, _}
  4. maps:find(Key, Offsets) — direct BIF
  5. element(Off+1, Vals) — direct tuple access
Fallback to Get.get for non-shape objects, prototype chain, etc.

This eliminates 4 cross-module function calls on the hot path:
  Get.get → get_own → Heap.get_obj_raw → Store.get_obj_raw → Process.get
The JIT can optimize local calls much better than cross-module calls
(no export table indirection, better branch prediction).

Preact render: 3.23ms → 3.09ms (143us, 4.4% faster).
When the compiler creates an object via wrap_keyed (batched object
literal), it records the offsets map as part of the stack type info
({:shaped_object, offsets}). For subsequent get_field on the same
variable, if the offset is known at compile time, emit direct
element(Off+1, Vals) bypassing maps:find entirely.

Also propagates shape info through define_field ops — each property
addition extends the known offsets map.

This enables V8-style monomorphic inline caching for same-block
property access patterns.
These functions previously called Heap.get_obj which triggers to_map
reconstruction (149ns per call for 5-key shapes). Now they check for
shape-backed objects first and use the shape's keys list or offsets
map directly, avoiding the full map reconstruction.

- enumerable_keys: uses Shapes.keys(shape_id) instead of to_map
- enumerable_string_props: uses Shapes.to_map only for the shape
  case (without proto overhead), bypasses get_obj delegation
- length_of: reads 'length' directly from shape offsets

Eliminates ~1000+ to_map reconstructions per render.
Preact render: 3.17ms → 3.03ms (137us, 4.3% faster).
Eliminate 2 delegation calls (Heap.put_obj_raw → Store.put_obj_raw)
and 1 function call (Shapes.put_val) for the common case where offset
is within the existing tuple size. For the transition path, inline
:erlang.append_element for the sequential-append case.

3,857 shape_put calls per render.
Generate op_truthy/1 and op_typeof/1 as local functions in each
compiled BEAM module. These are pure pattern-matching functions that
benefit from local call dispatch:

- op_truthy: handles nil/undefined/false/0/0.0/empty-string fast paths
  inline, eliminating 1,867 cross-module calls per render
- op_typeof: handles undefined/null/boolean/number/string inline,
  falls back to Values.typeof for complex types (883 calls)

Also wire branch_condition to use op_truthy instead of Values.truthy?.
Runs test262 test suites through NIF, compiler, and interpreter modes
and compares pass rates. Requires test262 to be checked out at
../quickjs/test262/.

Current results (subset):
  QuickJS NIF: 99.88% (79,827 tests, 98 errors)
  BEAM compiler: 80-100% depending on category
  BEAM interpreter: 80-100% (identical to compiler)

Compiler-specific failures: BigInt literal handling (pre-existing)
dannote added 30 commits April 25, 2026 12:02
- lib/quickbeam/vm/compiler/diagnostics.ex: check/1, explain/1, helper_call_counts/1
- bench/compiler_vs_interpreter.exs: compiler vs interpreter on six JS patterns
- Complete lowering tuple refactor (list→tuple for O(1) instruction access)
- Apply analysis tuple refactor to cfg.ex, stack.ex, types.ex
- Fix bench/compiler_vs_interpreter.exs Heap.reset issue
- Targeted benchmarks show compiler ~= interpreter on all patterns
- String concat: 8.42µs (compiler), 9.17µs (interpreter)
- Numeric loop: 61.33µs, 59.17µs
- Array loop: 201.83µs, 207.50µs
- Object field: 615.29µs, 614.04µs
- Function call: 1668.50µs, 1687.06µs
- Closure: 2385.31µs, 2408.00µs
…teger types

Compiler now emits direct BEAM arithmetic/bitwise ops when type analysis
proves operands are integers, bypassing Values.* runtime dispatch.

Specialized: mod→rem, band→band, bor→bor, bxor→bxor, shl→bsl, sar→bsr.
Variables, Objects, Functions, Iterators, Coercion — RuntimeHelpers
is now a thin defdelegate facade. Public API unchanged.
Generated BEAM code now has guard-based integer AND float fast paths
for add, mul, neg, lt/lte/gt/gte, div (with b!=0 guard), mod (with b!=0 guard).
BEAM JIT can optimize these to native arithmetic without bouncing through
Values.* module calls.

Fixed latent crash: op_div and op_mod guards now prevent Erlang crash
on division/mod by zero (JS returns Infinity/NaN, not crash).
Type analysis now tracks {:shaped_object, offsets, value_map} for object
literals with constant values. get_field_call inlines the constant value
directly when the key is known and the value is pure, bypassing heap access.

object_field benchmark: 615µs → 5.58µs (110× faster)
numeric_loop benchmark: 60µs → 5.92µs (10× faster, from post_inc inline)
Result: {"status":"keep","failing_tests":32,"passing_tests":0}
Result: {"status":"keep","failing_tests":32,"passing_tests":0}
…RL, URLSearchParams, atob/btoa, setTimeout, Headers, AbortController, performance, Blob, crypto, fetch/Request/Response)
…coder, URL, atob/btoa, setTimeout, Headers, AbortController, performance, Blob, crypto, fetch/Request/Response as native builtins.

Result: {"status":"keep","failing_tests":0,"passing_tests":32}
- TextEncoding: TextEncoder/TextDecoder
- URL: URL/URLSearchParams
- Encoding: atob/btoa
- Timers: setTimeout/clearTimeout/setInterval/clearInterval
- Headers: Headers (shared build_from_map/1 used by Fetch)
- Abort: AbortController
- Performance: performance object using build_object macro
- Blob: Blob
- Crypto: crypto object using build_object macro
- Fetch: fetch/Request/Response

web_apis.ex is now a thin aggregator that merges all bindings/0.
32/32 beam_web_apis tests passing.
Result: {"status":"keep","failing_tests":54,"passing_tests":38}
- TextEncoder: add encodeInto, fix WTF-8 lone surrogate handling
- TextDecoder: fix UTF-8 decoding, fatal mode, BOM stripping, ArrayBuffer support
- atob/btoa: type coercion, whitespace stripping, proper base64 handling
- crypto.getRandomValues: fix zero-length bug, add >65536 TypeError
- performance.now: return positive milliseconds relative to origin
- queueMicrotask: TypeError for non-function, silently discard errors
- structuredClone: deep clone objects, arrays, Date, RegExp, Map, Set, ArrayBuffer
- timers: implement timer macro queue with actual callback execution
- Promise constructor: properly call executor with resolve/reject
- String spread: fix [..str] operator to iterate codepoints
- top-level await: wrap code in async IIFE for eval_beam
- Map constructor: fix initialization from array-of-arrays (qb_arr elements)
- instanceof: add auto_proto for Date, RegExp, Map, Set, ArrayBuffer
- get_prototype_raw: check type-specialized methods before following proto chain
Result: {"status":"keep","failing_tests":0,"passing_tests":92}
- Timers: raw Process.get/put → Heap.Caches wrappers
- encoding.ex: delete duplicate coerce_to_string, use Values.stringify
- url.ex: raw throw → JSThrow.type_error!
- structured_clone.ex: deep_clone(:undefined) returns :undefined not nil
- Fix credo issues (number underscores, alias ordering, Map.new)
Result: {"status":"keep","failing_tests":171,"passing_tests":140}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant