Open
Conversation
0eb3475 to
7c1c574
Compare
75fdba5 to
527d5b9
Compare
Change shape representation from {:shape, shape_id, vals, proto} to
{:shape, shape_id, offsets, vals, proto}. The offsets map is inlined
from the shape table, eliminating the Process.get + Map.fetch! chain
on every property read.
Get.get shape hit: 775ns → 648ns (16% faster).
~380us saved per render at 3000 property reads.
…s.lookup fprof showed Shapes.get_shape (54K calls, 108ms) as the #1 bottleneck. It was still called from Put.put via Shapes.lookup on the hot path. Replace Shapes.lookup(shape_id, key) with Map.fetch(offsets, key) in shape_put, Store.put_obj_key, and put/3 for length. Get.get: 648ns → 406ns (37% faster) Preact render: 6.95ms → 6.55ms (5.8% faster)
…calls
transition/2 now returns {shape_id, offsets, offset} instead of
{shape_id, offset}. Callers no longer need a separate get_shape
call to fetch the new shape's offsets after transition.
Eliminates ~13K redundant shape table lookups per render.
Transition cache now stores {child_id, child_offsets} instead of
just child_id. Eliminates get_shape(child_id) on every cache hit.
get_shape calls per render: 27,930 → 14,819 (verified via fprof).
Preact render: 6.55ms → 6.2ms.
The grow path (adding a property to a shape-backed object) was doing: Tuple.to_list → ++ List.duplicate → ++ [val] → List.to_tuple (1693ns) Now uses :erlang.append_element for the common case where offset == tuple_size (sequential property addition): 217ns — 8× faster. Preact render: 6.1ms → 5.4ms (12% faster).
Heap.frozen? now checks a global :qb_has_frozen flag before doing per-object Process.get. In Preact SSR (where nothing is frozen), this eliminates 13,552 process dictionary lookups per render. Preact render: 5.37ms → 5.25ms.
Detect 'object; (push_val; define_field)* ...' patterns during lowering
and batch them into a single Heap.wrap(%{k1 => v1, k2 => v2, ...}).
Eliminates ~10K individual Put.put calls per Preact render (881 VNodes
× ~12 batched fields each). Each Put.put was doing: Process.get + shape
transition + put_val + Process.put.
Supported value opcodes: integer literals, null, undefined, booleans,
get_arg, get_loc, push_atom_value, empty string.
Falls back to individual define_field for values that can't be lowered
at compile time (function calls, computed values).
Preact render: 5.25ms → 4.4ms (16% faster).
Preact's h() function has 5 args, called 881 times per render. The generic Enum.take path was 14.5ms (8,504 calls). Direct pattern matching eliminates the list traversal overhead.
Replaces Enum.with_index + Enum.reduce with direct :maps.from_list for shape-to-map reconstruction.
Replace make_ref() + {:qb_obj, ref} tuple keys with monotonic integer
counter + raw integer keys in the process dictionary.
From the OTP JIT source (erl_process_dict.c), the hash function for
small integers is just 'unsigned_val(Term)' — essentially free. Tuple
keys go through the full erts_internal_hash which is much more
expensive. EQ comparison is also a single pointer compare for integers
vs deep tuple comparison for {:qb_obj, ref}.
Measured: Process.put with raw integer keys is 2× faster than tuple
keys. Process.get is 1.35× faster.
Preact render: 4.3ms → 3.55ms (17% faster).
Total session: 6.95ms → 3.55ms (49% faster).
- Replace List.zip with manual keys_vals_to_map recursion (1.6× faster) - Replace PD counter with :erlang.unique_integer (2.6× faster) Preact render: 3.55ms → 3.4ms.
Two changes:
1. inline_get_var_ref emits get_capture(ctx, {type, var_idx}) instead
of get_var_ref(ctx, integer_idx). The capture key is resolved at
compile time from closure_vars, eliminating Enum.at list traversal.
2. current_var_ref caches closure_vars as a tuple of capture keys per
function (keyed on byte_code binary). elem(tuple, idx) is O(1) vs
Enum.at(list, idx) which is O(n).
Before: 1,609 Enum.at calls at 20.6ms total (12.8us each).
After: 0 Enum.at calls. get_capture is 3.0us per call.
Shape IDs are contiguous integers 0..N. Storing shapes in a tuple and using elem(table, id) eliminates Map.fetch! overhead entirely. put_shape appends via :erlang.append_element for new shapes and uses put_elem for updates. Also eliminates the separate :qb_shape_next_id counter — the next ID is simply tuple_size(table). 5,565 get_shape calls/render × ~80ns savings = ~157us per render. Preact render: 3.50ms → 3.34ms.
The compiler now emits Heap.wrap_keyed(keys_tuple, vals_tuple) instead
of Heap.wrap(%{k1 => v1, ...}) for batched object literals.
wrap_keyed uses the keys tuple (a compile-time constant) as a cache key
to look up the pre-resolved shape. On cache hit, it skips:
- :erlang.phash2 of the key set
- Shapes.from_map shape resolution
- Map construction from keys/values
This eliminates ~103ns per object creation at 2,399 objects/render.
Preact render: 3.43ms → 3.25ms (5.2% faster).
Instead of emitting RuntimeHelpers.get_var(ctx, name) which traverses: get_var → fetch_ctx_var → context_globals → GlobalEnv.fetch → Map.fetch Emit :erlang.map_get(:globals, ctx) at compile time to extract the globals map, then call get_global(globals, name) which does a single Map.fetch. Eliminates 3 function calls per global variable access (2,644 calls per Preact render). Preact render: 3.25ms → 3.18ms.
The op_eq helper now has:
1. {Same, Same} → true (identity check, covers 79% of comparisons)
2. Number guards → == (existing)
3. Binary guards → == (string comparison without Values.eq dispatch)
4. Fallback → Values.eq (handles null/undefined cross-equality)
Values.eq calls: 5,175 → 1,075 per render.
Skip resetting home_object and super fields in the fast path for closures with need_home_object: false. These closures by definition don't use home objects, so the fields can inherit from the parent context safely. Saves 2 map update operations per function call (2,010 calls/render).
put_field skips normalize_key (key is known binary at compile time), frozen? check (not needed for just-created objects), __proto__ check (key is known not to be __proto__), and Heap.get_obj_raw delegation (calls Process.get directly). The compiler emits put_field for define_field opcodes where the key is a resolved string literal. Eliminates ~3 function calls and 2 guards per property write for 4,298 define_field ops per Preact render.
Generate op_get_field/2 as a local function in each compiled BEAM
module. The fast path does:
1. Pattern match {:obj, Id}
2. erlang:get(Id) — direct BIF call, no delegation chain
3. Pattern match {:shape, _, Offsets, Vals, _}
4. maps:find(Key, Offsets) — direct BIF
5. element(Off+1, Vals) — direct tuple access
Fallback to Get.get for non-shape objects, prototype chain, etc.
This eliminates 4 cross-module function calls on the hot path:
Get.get → get_own → Heap.get_obj_raw → Store.get_obj_raw → Process.get
The JIT can optimize local calls much better than cross-module calls
(no export table indirection, better branch prediction).
Preact render: 3.23ms → 3.09ms (143us, 4.4% faster).
When the compiler creates an object via wrap_keyed (batched object
literal), it records the offsets map as part of the stack type info
({:shaped_object, offsets}). For subsequent get_field on the same
variable, if the offset is known at compile time, emit direct
element(Off+1, Vals) bypassing maps:find entirely.
Also propagates shape info through define_field ops — each property
addition extends the known offsets map.
This enables V8-style monomorphic inline caching for same-block
property access patterns.
These functions previously called Heap.get_obj which triggers to_map reconstruction (149ns per call for 5-key shapes). Now they check for shape-backed objects first and use the shape's keys list or offsets map directly, avoiding the full map reconstruction. - enumerable_keys: uses Shapes.keys(shape_id) instead of to_map - enumerable_string_props: uses Shapes.to_map only for the shape case (without proto overhead), bypasses get_obj delegation - length_of: reads 'length' directly from shape offsets Eliminates ~1000+ to_map reconstructions per render. Preact render: 3.17ms → 3.03ms (137us, 4.3% faster).
Eliminate 2 delegation calls (Heap.put_obj_raw → Store.put_obj_raw) and 1 function call (Shapes.put_val) for the common case where offset is within the existing tuple size. For the transition path, inline :erlang.append_element for the sequential-append case. 3,857 shape_put calls per render.
Generate op_truthy/1 and op_typeof/1 as local functions in each compiled BEAM module. These are pure pattern-matching functions that benefit from local call dispatch: - op_truthy: handles nil/undefined/false/0/0.0/empty-string fast paths inline, eliminating 1,867 cross-module calls per render - op_typeof: handles undefined/null/boolean/number/string inline, falls back to Values.typeof for complex types (883 calls) Also wire branch_condition to use op_truthy instead of Values.truthy?.
Runs test262 test suites through NIF, compiler, and interpreter modes and compares pass rates. Requires test262 to be checked out at ../quickjs/test262/. Current results (subset): QuickJS NIF: 99.88% (79,827 tests, 98 errors) BEAM compiler: 80-100% depending on category BEAM interpreter: 80-100% (identical to compiler) Compiler-specific failures: BigInt literal handling (pre-existing)
- lib/quickbeam/vm/compiler/diagnostics.ex: check/1, explain/1, helper_call_counts/1 - bench/compiler_vs_interpreter.exs: compiler vs interpreter on six JS patterns - Complete lowering tuple refactor (list→tuple for O(1) instruction access) - Apply analysis tuple refactor to cfg.ex, stack.ex, types.ex
- Fix bench/compiler_vs_interpreter.exs Heap.reset issue - Targeted benchmarks show compiler ~= interpreter on all patterns - String concat: 8.42µs (compiler), 9.17µs (interpreter) - Numeric loop: 61.33µs, 59.17µs - Array loop: 201.83µs, 207.50µs - Object field: 615.29µs, 614.04µs - Function call: 1668.50µs, 1687.06µs - Closure: 2385.31µs, 2408.00µs
…teger types Compiler now emits direct BEAM arithmetic/bitwise ops when type analysis proves operands are integers, bypassing Values.* runtime dispatch. Specialized: mod→rem, band→band, bor→bor, bxor→bxor, shl→bsl, sar→bsr.
Variables, Objects, Functions, Iterators, Coercion — RuntimeHelpers is now a thin defdelegate facade. Public API unchanged.
…storage abstraction
…odules back, drop unused Heap delegates
…ties/1, add differential tests
… table instability
Generated BEAM code now has guard-based integer AND float fast paths for add, mul, neg, lt/lte/gt/gte, div (with b!=0 guard), mod (with b!=0 guard). BEAM JIT can optimize these to native arithmetic without bouncing through Values.* module calls. Fixed latent crash: op_div and op_mod guards now prevent Erlang crash on division/mod by zero (JS returns Infinity/NaN, not crash).
Type analysis now tracks {:shaped_object, offsets, value_map} for object
literals with constant values. get_field_call inlines the constant value
directly when the key is known and the value is pure, bypassing heap access.
object_field benchmark: 615µs → 5.58µs (110× faster)
numeric_loop benchmark: 60µs → 5.92µs (10× faster, from post_inc inline)
Result: {"status":"keep","failing_tests":32,"passing_tests":0}
Result: {"status":"keep","failing_tests":32,"passing_tests":0}
…RL, URLSearchParams, atob/btoa, setTimeout, Headers, AbortController, performance, Blob, crypto, fetch/Request/Response)
…coder, URL, atob/btoa, setTimeout, Headers, AbortController, performance, Blob, crypto, fetch/Request/Response as native builtins.
Result: {"status":"keep","failing_tests":0,"passing_tests":32}
- TextEncoding: TextEncoder/TextDecoder - URL: URL/URLSearchParams - Encoding: atob/btoa - Timers: setTimeout/clearTimeout/setInterval/clearInterval - Headers: Headers (shared build_from_map/1 used by Fetch) - Abort: AbortController - Performance: performance object using build_object macro - Blob: Blob - Crypto: crypto object using build_object macro - Fetch: fetch/Request/Response web_apis.ex is now a thin aggregator that merges all bindings/0. 32/32 beam_web_apis tests passing.
…acros, fix heap GC for mutable state
Result: {"status":"keep","failing_tests":54,"passing_tests":38}
- TextEncoder: add encodeInto, fix WTF-8 lone surrogate handling - TextDecoder: fix UTF-8 decoding, fatal mode, BOM stripping, ArrayBuffer support - atob/btoa: type coercion, whitespace stripping, proper base64 handling - crypto.getRandomValues: fix zero-length bug, add >65536 TypeError - performance.now: return positive milliseconds relative to origin - queueMicrotask: TypeError for non-function, silently discard errors - structuredClone: deep clone objects, arrays, Date, RegExp, Map, Set, ArrayBuffer - timers: implement timer macro queue with actual callback execution - Promise constructor: properly call executor with resolve/reject - String spread: fix [..str] operator to iterate codepoints - top-level await: wrap code in async IIFE for eval_beam - Map constructor: fix initialization from array-of-arrays (qb_arr elements) - instanceof: add auto_proto for Date, RegExp, Map, Set, ArrayBuffer - get_prototype_raw: check type-specialized methods before following proto chain
Result: {"status":"keep","failing_tests":0,"passing_tests":92}
- Timers: raw Process.get/put → Heap.Caches wrappers - encoding.ex: delete duplicate coerce_to_string, use Values.stringify - url.ex: raw throw → JSThrow.type_error! - structured_clone.ex: deep_clone(:undefined) returns :undefined not nil - Fix credo issues (number underscores, alias ordering, Map.new)
Result: {"status":"keep","failing_tests":171,"passing_tests":140}
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a second QuickJS execution backend on the BEAM.
What’s in here
:beambackend viaQuickBEAM.disasm/2mode: :beamsupport in the public APIrequire(), module loading, dynamic import, globals, handlers, and interop for the VM pathError.captureStackTraceRuntime coverage
Object,Array,Function,String,Number,BooleanMath,JSON,Date,RegExpMap,Set,WeakMap,WeakSet,SymbolPromise,async/await, generators, async generatorsProxy,ReflectTypedArray,ArrayBuffer,BigIntsuper, private fields, private methods, private accessors, static private members, brand checksValidation
QUICKBEAM_BUILD=1 MIX_ENV=test mix testMIX_ENV=test QUICKBEAM_BUILD=1 mix test test/vm/js_engine_test.exs --include js_engine --seed 0mix compile --warnings-as-errorsmix format --check-formattedmix credo --strictmix dialyzermix ex_dnazlint lib/quickbeam/*.zig lib/quickbeam/napi/*.zigbunx oxlint -c oxlint.json --type-aware --type-check priv/ts/bunx jscpd lib/quickbeam/*.zig priv/ts/*.ts --min-tokens 50 --threshold 0Current local result:
2363 tests, 0 failures, 1 skipped, 54 excluded