Summary
Follow-up to the PositionalKVCache trace-fidelity fix in #193 (root cause: SKaiNET#763). The shared-cache variants in transformer-core/.../transformer/KVCache.kt still read K/V back through the heap buffer (sliceView / ctx.fromData, plus padHeadDim's copyToFloatArray + fromData), which bypasses ctx.ops. So under tracing (ctx.isRecording, embedConstants=true) they bake an all-zero KV buffer as stablehlo.constant and disconnect the computed k_proj/v_proj — exactly the #763 bug — for any model that uses kv-shared layers.
Affected (none carry the isRecording functional path #193 added to PositionalKVCache.update):
SharedPositionalKVCache.update → delegate.currentView(...) (buffer)
PaddedSharedPositionalKVCache.update → padHeadDim(...) (raw copyToFloatArray + ctx.fromData) + delegate.sliceView(...) (buffer)
OwnerReadOnlyKVCache.update → delegate.sliceView(...) (buffer)
Impact
Real Gemma models with kvSharedLayers > 0 (e.g. Gemma 4 E2B's last-N shared group) will export decoders that attend over K=V=0 — masked in non-sandwich configs by the unnormalized FFN, exposed by sandwich norms / any downstream normalization, same as #763. The gemma-decoder conformance model uses the non-shared PositionalKVCache (kvSharedLayers=0), so it's unaffected and already validates; this gap only bites kv-shared exports.
Fix direction
Mirror #193: give the shared variants an ctx.isRecording branch that wires K/V functionally (ops.concat history) instead of the buffer round-trip. The padding in PaddedSharedPositionalKVCache should also use ctx.ops (pad/concat) when recording rather than copyToFloatArray + fromData. The "owner read-only" semantics (followers read the owner's accumulated history) need the owner's traced K/V history exposed for wiring — likely a shared tracedKeys/tracedValues on the owner PositionalKVCache.
Verification
Add a kv-shared config to a trace-fidelity test (cf. KvCacheTraceFidelityTest): trace a tiny Gemma with kvSharedLayers > 0 + sandwichNorms = true, assert no zero KV-cache constants in the export; ideally also run it end-to-end through skainet-iree-conformance against the SKaiNET-CPU oracle.
Summary
Follow-up to the
PositionalKVCachetrace-fidelity fix in #193 (root cause: SKaiNET#763). The shared-cache variants intransformer-core/.../transformer/KVCache.ktstill read K/V back through the heap buffer (sliceView/ctx.fromData, pluspadHeadDim'scopyToFloatArray+fromData), which bypassesctx.ops. So under tracing (ctx.isRecording,embedConstants=true) they bake an all-zero KV buffer asstablehlo.constantand disconnect the computedk_proj/v_proj— exactly the #763 bug — for any model that uses kv-shared layers.Affected (none carry the
isRecordingfunctional path #193 added toPositionalKVCache.update):SharedPositionalKVCache.update→delegate.currentView(...)(buffer)PaddedSharedPositionalKVCache.update→padHeadDim(...)(rawcopyToFloatArray+ctx.fromData) +delegate.sliceView(...)(buffer)OwnerReadOnlyKVCache.update→delegate.sliceView(...)(buffer)Impact
Real Gemma models with
kvSharedLayers > 0(e.g. Gemma 4 E2B's last-N shared group) will export decoders that attend over K=V=0 — masked in non-sandwich configs by the unnormalized FFN, exposed by sandwich norms / any downstream normalization, same as #763. Thegemma-decoderconformance model uses the non-sharedPositionalKVCache(kvSharedLayers=0), so it's unaffected and already validates; this gap only bites kv-shared exports.Fix direction
Mirror #193: give the shared variants an
ctx.isRecordingbranch that wires K/V functionally (ops.concathistory) instead of the buffer round-trip. The padding inPaddedSharedPositionalKVCacheshould also usectx.ops(pad/concat) when recording rather thancopyToFloatArray+fromData. The "owner read-only" semantics (followers read the owner's accumulated history) need the owner's traced K/V history exposed for wiring — likely a sharedtracedKeys/tracedValueson the ownerPositionalKVCache.Verification
Add a kv-shared config to a trace-fidelity test (cf.
KvCacheTraceFidelityTest): trace a tiny Gemma withkvSharedLayers > 0+sandwichNorms = true, assert no zero KV-cache constants in the export; ideally also run it end-to-end throughskainet-iree-conformanceagainst the SKaiNET-CPU oracle.