Skip to content

perf: fast-path strict json imports#840

Merged
stephenamar-db merged 2 commits into
databricks:masterfrom
He-Pin:perf/strict-json-import-fast-path
May 11, 2026
Merged

perf: fast-path strict json imports#840
stephenamar-db merged 2 commits into
databricks:masterfrom
He-Pin:perf/strict-json-import-fast-path

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 11, 2026

Motivation:
Real-world workloads such as kube-prometheus import large strict JSON files. Parsing those files through the full Jsonnet parser creates avoidable parser, AST, and manifestation work.

Key Design Decision:
The fast path only accepts strict .json input and falls back to normal Jsonnet parsing for malformed JSON, duplicate keys, non-finite numbers, parser-depth overflow, and defensive numeric parse failures. Imported JSON objects use a race-free inline-array Val.Obj layout: field caching is disabled for parse-cache-shared literals, and single-field objects avoid lazy value0 mutation. JSON value positions reuse fileScope.noOffsetPos because strict JSON imports do not execute code and per-element positions are not consumed for stack traces.

Modification:
Add a shared strict JSON import visitor in CachedResolver, wire it into Preloader, trim duplicate-key/string visitor work, use inline object layout for imported JSON objects, and add shared-cache/concurrency regression tests.

Benchmark Results:

Benchmark master PR #840 Delta Notes
Scala Native hyperfine kube-prometheus (entry-kube-prometheus.jsonnet -J vendor) 223.351 +/- 12.742 ms 139.041 +/- 2.847 ms -37.75% Output matched master.
Scala Native hyperfine stacked exploration after JSON refinements ~235 ms starting stack 135.9 +/- 1.2 ms ~ -42% cumulative Source-built jrsonnet comparison: 88.087 +/- 1.737 ms, remaining gap 1.54x.
Focused JMH guards manifestJsonEx 0.053, realistic2 39.485, large_string_template 1.102, gen_big_object 0.801 stable From final position-reuse exploration run.

Analysis:
The PR keeps Jsonnet compatibility by treating the JSON parser as an optional strict fast path, not a replacement parser. The added JVM-only concurrency tests exercise shared parse-cache reuse; cross-platform JSON import tests cover the platform-neutral behavior.

References:
Source exploration commits: He-Pin/sjsonnet 995b0822, 114bd69f, 52e22696, ca6f24d5.

Result:
Local ./mill --no-server -j 1 __.reformat and ./mill --no-server -j 1 __.test passed on this split branch (2066/2066).

Motivation:
Kube-prometheus imports large strict JSON files; parsing them through the full Jsonnet parser creates avoidable AST and materialization work.

Modification:
Add a strict .json import fast path shared by CachedResolver and Preloader, trim visitor work, build race-free inline-array objects for imported JSON, and reuse the no-offset position for imported literals.

Result:
Strict JSON imports keep Jsonnet fallback semantics for non-strict inputs while reducing parse and manifestation overhead for large imported JSON data.
@He-Pin He-Pin marked this pull request as ready for review May 11, 2026 09:36
Motivation:
Preloaded importstr and importbin entries for the same path were separated in Preloader, but CachedImporter still keyed reads only by Path. Interpreter evaluation could then reuse a text resolved file for binary reads or the reverse.

Modification:
Key CachedImporter entries by both Path and binaryData, update the top-level source cache insertion to use the text key, and add interpreter-level regression coverage for both import orders.

Result:
Preloaded text and binary imports for the same path remain independent through normal Interpreter evaluation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@stephenamar-db stephenamar-db merged commit 32dba7c into databricks:master May 11, 2026
5 checks passed
stephenamar-db pushed a commit that referenced this pull request May 18, 2026
Motivation:

PR #840 added a strict JSON import fast path, but cached file imports
still decoded every small file to a UTF-8 `String` before strict `.json`
parsing. Real-world Jsonnet projects such as kube-prometheus import many
JSON files, so decoding bytes to text only to parse them as UTF-8 JSON
adds avoidable allocation and CPU cost.

Key Design Decision:

Use `ujson.ByteArrayParser` for strict `.json` imports and keep small
cached resolved files byte-backed until text is explicitly needed by
`importstr` or Fastparse. This keeps the JSON path byte-native while
preserving existing text behavior for Jsonnet source parsing.

Modification:

* `CachedResolvedFile` now caches small resolved files as `Array[Byte]`,
lazily materializes text for `getParserInput` / `readString`, and
returns cached bytes directly for `readRawBytes`.
* `CachedResolver.parseJsonImport` now parses strict JSON imports with
`ujson.ByteArrayParser.transform(content.readRawBytes(), visitor)`.
* `PreloaderTests` now assert strict JSON preload does not fall back to
Fastparse or decode via `readString`.

Result:

Output equality was preserved against upstream sjsonnet and source-built
jrsonnet:

* kube-prometheus realworld output: `7,506,029` bytes, byte-identical.
* `large_string_template` guard output: `549,674` bytes, byte-identical.

Benchmark Results:

Native whole-process kube-prometheus, command run from
`jrsonnet/tests/realworld` with `-J vendor`, `hyperfine --shell=none
--warmup 5 --runs 50`:

| order | upstream/master | this PR | jrsonnet |
|---|---:|---:|---:|
| forward | `143.351 +/- 2.556 ms` | `135.806 +/- 3.388 ms` | `91.752
+/- 7.863 ms` |
| reverse | `145.082 +/- 2.988 ms` | `136.392 +/- 2.034 ms` | `91.365
+/- 1.419 ms` |

Native guard, `bench/resources/cpp_suite/large_string_template.jsonnet`:

| upstream/master | this PR | jrsonnet |
|---:|---:|---:|
| `11.036 +/- 0.807 ms` | `10.762 +/- 0.650 ms` | `5.204 +/- 0.511 ms` |

JMH guard, `bench.runRegressions`:

| benchmark | upstream/master | this PR |
|---|---:|---:|
| `cpp_suite/large_string_template.jsonnet` | `0.757 ms/op` | `0.691
ms/op` |
| `go_suite/manifestJsonEx.jsonnet` | `0.055 ms/op` | `0.054 ms/op` |

Analysis:

The target workload improves by about 5-6% in both command orders. The
remaining gap to jrsonnet is not eliminated, but this change removes one
clear extra decode from strict JSON import handling without regressing
the non-target renderer guard or JMH guards.

Validation:

* `./mill --no-server --ticker false --color false -j 1 __.checkFormat`
* `./mill --no-server --ticker false --color false -j 1 __.test`
(`Tests: 440, Passed: 440, Failed: 0`)

References:

Follow-up to #840
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants