Skip to content

KV-cache trace fidelity: shared-cache variants still bake K=V=0 under embedConstants #194

Description

@michalharakal

Summary

Follow-up to the PositionalKVCache trace-fidelity fix in #193 (root cause: SKaiNET#763). The shared-cache variants in transformer-core/.../transformer/KVCache.kt still read K/V back through the heap buffer (sliceView / ctx.fromData, plus padHeadDim's copyToFloatArray + fromData), which bypasses ctx.ops. So under tracing (ctx.isRecording, embedConstants=true) they bake an all-zero KV buffer as stablehlo.constant and disconnect the computed k_proj/v_proj — exactly the #763 bug — for any model that uses kv-shared layers.

Affected (none carry the isRecording functional path #193 added to PositionalKVCache.update):

  • SharedPositionalKVCache.updatedelegate.currentView(...) (buffer)
  • PaddedSharedPositionalKVCache.updatepadHeadDim(...) (raw copyToFloatArray + ctx.fromData) + delegate.sliceView(...) (buffer)
  • OwnerReadOnlyKVCache.updatedelegate.sliceView(...) (buffer)

Impact

Real Gemma models with kvSharedLayers > 0 (e.g. Gemma 4 E2B's last-N shared group) will export decoders that attend over K=V=0 — masked in non-sandwich configs by the unnormalized FFN, exposed by sandwich norms / any downstream normalization, same as #763. The gemma-decoder conformance model uses the non-shared PositionalKVCache (kvSharedLayers=0), so it's unaffected and already validates; this gap only bites kv-shared exports.

Fix direction

Mirror #193: give the shared variants an ctx.isRecording branch that wires K/V functionally (ops.concat history) instead of the buffer round-trip. The padding in PaddedSharedPositionalKVCache should also use ctx.ops (pad/concat) when recording rather than copyToFloatArray + fromData. The "owner read-only" semantics (followers read the owner's accumulated history) need the owner's traced K/V history exposed for wiring — likely a shared tracedKeys/tracedValues on the owner PositionalKVCache.

Verification

Add a kv-shared config to a trace-fidelity test (cf. KvCacheTraceFidelityTest): trace a tiny Gemma with kvSharedLayers > 0 + sandwichNorms = true, assert no zero KV-cache constants in the export; ideally also run it end-to-end through skainet-iree-conformance against the SKaiNET-CPU oracle.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions