Switch to CompilerCaching.jl by maleadt · Pull Request #794 · JuliaGPU/GPUCompiler.jl

maleadt · 2026-05-12T09:39:32Z

Will be a breaking release. Sadly also drops 1.10.

vchuravy · 2026-05-12T12:17:48Z

Sadly also drops 1.10.

That's unfortunate and will likely mean that we will have to maintain the prior version of GPUCompiler until there is a new LTS.

maleadt · 2026-05-12T12:51:43Z

will likely mean that we will have to maintain the prior version of GPUCompiler until there is a new LTS.

I'm not convinced that's needed. As long as there's no breaking releases in the back-ends, users can simply use different versions of say CUDA.jl 6.x depending on which Julia version they're using. And for critical features I'd rather they request backports over there rather than maintaining multiple compiler stacks.

vchuravy · 2026-05-12T14:10:44Z

The issue is that for Enzyme we won't be able to drop 1.10 support (due to DiffEq and so forth) and there the situation is less stable than for the GPU backends.

maleadt · 2026-05-13T15:39:57Z

Grmbl. I'll try to come up with something that keeps 1.10 working then.

Drop the hand-rolled CodeCache, on-disk kernel cache, and various pre-1.11 compatibility shims; route inference and CI lookup through CompilerCaching.CacheView with consumer-defined results structs attached to each CodeInstance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previously partitioned only by `typeof(target)`, which collided across target instances that produce different IR (e.g. different macOS, SM arch, or CPU features). Folding the full target and params into the token matches the spirit of `runtime_slug`, while staying within the inference-determinant scope of the docstring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds optional `bitcode`/`bitcode!` hooks on the consumer's `results_type`. When opted in, `emit_function!` reads renamed per-function bitcode from the cached CI on a hit, and writes it on a miss. Cross-session persistence rides on package precompilation; a small session-local assembled-module cache (keyed by `(cache_owner, opaque_pointers)`) keeps the within-session fast path. Drops the `runtime_slug` interface — `cache_owner` now subsumes its role of identifying compatible IR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`emit_function!` now memoizes each `gpu_*` runtime function's renamed, post-irgen LLVM bitcode on its own `CodeInstance`'s `analysis_results` when the back-end opts in via the new `bitcode`/`bitcode!` trait pair. Cross-session persistence rides on package precompilation; the session-local `_runtime_libs` assembled-module cache keeps repeated within-session linking cheap. Gated on `HAS_INTEGRATED_CACHE` so 1.10 falls through to plain compile + link (still serviced by `_runtime_libs`). With Metal opted in, a second kernel compile in the same session — even after `reset_runtime()` invalidates `_runtime_libs` — completes ~25× faster than a cold rebuild on 1.12, because each runtime function is now a parse-and-link instead of a full Julia → LLVM run. Restores the optimization originally landed in aa4e64d (reverted in 566811d for 1.10 compat); the new version sits behind the same infrastructure as `cached_compilation`, so the 1.10 path is no longer load-bearing on the trait being callable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

On Julia 1.13+, `jl_emit_native_impl` itself sets every `jl_sysimg_gvar` to a null initializer before returning (aotcompile.cpp:865), leaving relocation to the caller. On 1.12, `jl_emit_native_impl` instead bakes session-local pointer values into the initializer via `literal_static_pointer_val` — so without intervention, the bitcode we hand to `bitcode!` for caching carries live pointers from the current session and isn't safe to reload in a future one. After collecting `gv_to_value` from `jl_get_llvm_gvs` / `jl_get_llvm_gvs_globals`, immediately reset each tracked GV's initializer to null. `relocate_gvs!` at the toplevel link step then re-applies the session-current values regardless of which Julia we're on, so optimization still sees the resolved constants. On 1.13+ the null-out is a no-op (Julia already nulled them); on 1.12 this is what makes per-CI runtime bitcode caching genuinely cross-session-safe for back-ends that pull in `julia.constgv`-touching runtime functions (CUDA's `gc_pool_alloc`, `box_*`/`unbox_*`, …). Metal's stubs don't trip this either way. Verified: 27 `julia.constgv` GVs in Metal's cached runtime-fn bitcode on 1.12, all with null initializers post-change (was 27/27 non-null). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CompilerCaching becomes a strong dep (it loads as an empty shell on 1.10, so there's no overhead). The `GPUCompilerCompilerCachingExt` extension — which existed just to wire the parametric `CC.finish!` override that attaches a `CachedResult{V}` to every inferred CodeInstance — moves into `jlgen.jl` directly, gated on `HAS_INTEGRATED_CACHE`. Same code, one less file. With CompilerCaching available unconditionally we can also drop the inline copies of its inference machinery from `jlgen.jl`: - `drive_inference!` on 1.11+ is now a two-line delegation to `CompilerCaching.typeinf!` + `get(cache, mi, nothing)`. The 1.10 implementation (which talks to the per-interpreter `CodeCache`) moves to `deprecated.jl`. - `collect_codeinfos` / `_ci_codeinfo` go away; the single call site in `compile_method_instance` calls `CompilerCaching.get_codeinfos` directly. - `StackedMethodTable` is re-exported from CompilerCaching on 1.11+; the 1.10 variant (with the older `MethodMatchResult`-shape `findall`) moves to `deprecated.jl`. Net result: ~200 lines deleted from `jlgen.jl`, no behavior change. All 1.10-only code is now in `deprecated.jl`, ready to disappear in one diff when 1.10 support drops. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the hand-rolled `CC.finish!` override with `@setup_results GPUInterpreter` plus a one-line `CompilerCaching.results_type` trait that reads V from the interpreter's type parameter. Same generated code, less boilerplate. `drive_inference!` collapses to a one-liner calling `CompilerCaching.typeinf!(interp, mi)` — which now constructs the CacheView internally and returns the root CI directly, saving the lookup the old cache-taking form required. The `cache_view(interp)` helper inside `jlgen.jl` goes away with it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rotocol. The old `bitcode(results)` / `bitcode!(results, bytes)` pair was a single-purpose hook bolted onto the consumer's results struct. Extending it to memoize more phases (LLVM IR, intermediate AIR, whatever) meant adding a parallel pair per phase. Worse, it required the consumer's results struct to be in the loop on every cache touch — which forced `rtlib.jl` to know about `CompilerCaching` to do the per-CI lookup that fetched the right results instance. Replace with a back-end-managed key→bytes protocol on `CompilerJob`: cache_get(job::CompilerJob, key::Symbol) -> Union{Nothing, Vector{UInt8}} cache_put!(job::CompilerJob, key::Symbol, ::Vector{UInt8}) GPUCompiler hands the back-end a job + a key (`:llvm_ir` currently — the post-irgen LLVM bitcode for runtime library functions). The back-end stores it wherever it likes — typically on a CI's `analysis_results` via CompilerCaching, but it could equally be an in-memory `Dict`, on-disk storage, or nothing. The default no-op pair means no caching. `rtlib.jl` no longer imports `CompilerCaching` — it just calls the hooks. New phase keys can be added without growing the API surface; back-ends opt in selectively by matching on keys they care about. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the previous V-threaded design (GPUInterpreter{V}, results_type(job), @setup_results, cache_get/cache_put!, cache_view) with a single back-end-facing entry point: cached_results(::Type{V}, job::CompilerJob)::V which returns the (lazily created) results struct for a job. Back-ends define one mutable struct holding their per-stage artifacts, check completeness, and compile into it — a single code path on all supported Julia versions: - On 1.11+, the struct lives on the CodeInstance in Julia's integrated cache (running inference to create one when needed), wrapped in a config-keyed JobResults container. CompilerCaching attaches results lazily, so the GPUInterpreter no longer carries a results type, and independent consumers (e.g. our own runtime-library cache) can attach to the same CI. - On 1.10, the struct lives in a session-local Dict keyed by the same job identity, kept alongside the other legacy code in deprecated.jl. Keying results by the full CompilerConfig (not just cache_owner) fixes a latent bug in the previous design where two jobs differing only in codegen settings — e.g. the kernel name — would share artifacts. The owner token still covers only what affects inference, so inference results remain shared across such jobs. The runtime library now uses the same mechanism: emit_function! memoizes each runtime function's renamed bitcode in a RuntimeFunctionResults attached to its CI, replacing the cache_get/cache_put! protocol. Back-ends no longer opt in: runtime bitcode persists through precompilation automatically on 1.11+. Also moves the 1.10 drive_inference! definition from deprecated.jl to jlgen.jl: its signature references GPUInterpreter, which isn't defined yet when deprecated.jl is included (1.10 loading was broken on the previous branch), and fixes the ptx precompile test to construct its cache token from the standalone package's helper module (the sandbox copy defines a distinct CompilerParams type, so its token can never match the precompiled CIs). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Dynamic construction of the cache token made a cached lookup ~3x slower than the legacy cached_compilation path (684 vs 234 ns); specialized, it is now faster (166 vs 182 ns). Instantiations are bounded: one per back-end (and results type). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

A Vector field made the target mutable under jl_egal, so owner tokens and configs deserialized from package images never matched and cached kernels were silently recompiled. Use a single --spirv-ext specifier string instead, mirroring LLVM feature strings (and GCN's features). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Artifacts derived from IR with relocated GVs embed absolute pointers from the precompilation process. Mark such jobs during output generation and drop their JobResults entries from an atexit hook, which runs before jl_write_compiler_output: within-session lookups still hit, but later sessions recompile instead of loading dangling pointers. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

CIs deposited by our own precompile workload carry world ages from the precompilation process and are dead weight in later sessions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The derived runtime config inherited cosmetic fields like name=, so runtime function artifacts were cached (and persisted) once per kernel config variation instead of once per cache owner. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

codecov · 2026-06-16T19:05:23Z

Codecov Report

❌ Patch coverage is 37.82772% with 166 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.94%. Comparing base (ea01d8b) to head (a1e6c06).

Files with missing lines	Patch %	Lines
src/deprecated.jl	0.00%	125 Missing ⚠️
src/jlgen.jl	55.38%	29 Missing ⚠️
src/interface.jl	83.33%	7 Missing ⚠️
src/rtlib.jl	87.50%	3 Missing ⚠️
src/spirv.jl	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #794      +/-   ##
==========================================
- Coverage   79.02%   73.94%   -5.09%     
==========================================
  Files          25       25              
  Lines        4630     4276     -354     
==========================================
- Hits         3659     3162     -497     
- Misses        971     1114     +143

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This was referenced May 12, 2026

Switch to CompilerCaching.jl JuliaGPU/Metal.jl#777

Draft

Switch to CompilerCaching.jl JuliaGPU/OpenCL.jl#431

Draft

maleadt force-pushed the tb/compilercaching branch from 93feac5 to a61b57b Compare June 16, 2026 18:25

maleadt and others added 22 commits June 16, 2026 20:27

Remove 1.10 from CI.

64149c3

Fix tests.

93188aa

Restore 1.10 compat.

66ffba5

Don't cache relocated GVs.

68b094e

Reword GPUInterpreter docstring after extension removal.

56ae71e

Fix version.

34e645c

Clear the 1.10 CodeCache registry on load.

6dc2052

CIs deposited by our own precompile workload carry world ages from the precompilation process and are dead weight in later sessions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Fix stale comment about the removed 1.10 methodinstance generator.

1043875

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Remove irrelevant TODOs.

a1e6c06

maleadt force-pushed the tb/compilercaching branch from 806ec79 to a1e6c06 Compare June 16, 2026 18:28

Fixes.

1e5e813

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to CompilerCaching.jl#794

Switch to CompilerCaching.jl#794
maleadt wants to merge 23 commits into
mainfrom
tb/compilercaching

maleadt commented May 12, 2026 •

edited

Loading

Uh oh!

vchuravy commented May 12, 2026

Uh oh!

maleadt commented May 12, 2026

Uh oh!

vchuravy commented May 12, 2026

Uh oh!

maleadt commented May 13, 2026

Uh oh!

codecov Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maleadt commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vchuravy commented May 12, 2026

Uh oh!

maleadt commented May 12, 2026

Uh oh!

vchuravy commented May 12, 2026

Uh oh!

maleadt commented May 13, 2026

Uh oh!

codecov Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maleadt commented May 12, 2026 •

edited

Loading

codecov Bot commented Jun 16, 2026 •

edited

Loading