Skip to content

release: SKaiNET-transformers 0.32.0#197

Merged
michalharakal merged 2 commits into
developfrom
release/0.32.0
Jun 25, 2026
Merged

release: SKaiNET-transformers 0.32.0#197
michalharakal merged 2 commits into
developfrom
release/0.32.0

Conversation

@michalharakal

Copy link
Copy Markdown
Contributor

PR 3 of 3 — IREE/TinyLlama fix-stack (merge in order)

Stacked on #196. Base will auto-retarget to develop once #196 merges. Merge this last.

Commits

  • a91f57d chore(release): prepare 0.32.0 — bumps VERSION_NAME 0.31.1→0.32.0 (gradle.properties, libs.versions.toml), CHANGELOG, README, docs
  • 068647f chore(release): sync API dumps for 0.32.0 + validate transformer-core

Notes

  • develop advanced since this branch was cut (dependabot bumps + kvcache fix fix(kvcache): trace-faithful PositionalKVCache.update (#763) #193). Test-merge into current develop is clean — libs.versions.toml auto-merges, no conflicts.
  • Re-run ./gradlew apiCheck after the rebase/retarget in case develop's kvcache change shifted the public API surface vs. the dumps generated here.

Stack (merge order)

  1. feat(llama): NATIVE_OPTIMIZED packed weight path (mirror Gemma) #195 (packed weights)
  2. perf(mha)+fix(rope): fused decode-attention & traceable interleaved RoPE #196 (fused attention + RoPE)
  3. this PR (release 0.32.0)

Merge with a merge-commit / rebase (do NOT squash).

michalharakal and others added 2 commits June 25, 2026 10:57
Ships the real-GGUF Llama eager path (packed NATIVE_OPTIMIZED) and unblocks
StableHLO/IREE export for Llama-family models (traceable interleaved RoPE).
Against engine 0.32.0.

- VERSION_NAME 0.31.1 -> 0.32.0; engine pin skainet 0.31.0 -> 0.32.0.
- CHANGELOG: [0.32.0] section (NATIVE_OPTIMIZED Llama / fused decode-attention /
  traceable RoPE / packed-embedding gather fix).
- README: Current release + "What's new in 0.32.0".
- antora samples: BOM coordinate 0.31.1 -> 0.32.0 (getting-started-java,
  llama3-tool-calling).

Features (since 0.31.1): ccbd87e NATIVE_OPTIMIZED Llama, 3791f88 fused
decode-attention, 019b049 traceable interleaved RoPE.

Verified: transformer-core + llm-inference:llama compile against published
engine 0.32.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
apiCheck was red: llm-core.api still listed the NN primitives that the 0.31.1
transformer-core extraction moved out, and transformer-core had no API
validation at all — so those public types were tracked nowhere.

- Add binary-compatibility-validator to transformer-core; apiDump creates
  transformer-core/api/jvm/transformer-core.api (the 30 moved primitives:
  attention, KV-cache family, embedding, norms, RoPE, FFNs, linear projection).
- Regenerate llm-core.api (drop the moved primitives — they're now tracked in
  transformer-core, not lost).
- llama.api: + convertLlamaWeightsPacked (the 0.32.0 NATIVE_OPTIMIZED feature).

apiCheck now passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@michalharakal michalharakal changed the base branch from perf/fused-decode-attention to develop June 25, 2026 13:32
@michalharakal michalharakal merged commit 217c1cb into develop Jun 25, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant