release: SKaiNET-transformers 0.32.0 by michalharakal · Pull Request #197 · SKaiNET-developers/SKaiNET-transformers

michalharakal · 2026-06-25T13:26:49Z

PR 3 of 3 — IREE/TinyLlama fix-stack (merge in order)

Stacked on #196. Base will auto-retarget to develop once #196 merges. Merge this last.

Commits

a91f57d chore(release): prepare 0.32.0 — bumps VERSION_NAME 0.31.1→0.32.0 (gradle.properties, libs.versions.toml), CHANGELOG, README, docs
068647f chore(release): sync API dumps for 0.32.0 + validate transformer-core

Notes

develop advanced since this branch was cut (dependabot bumps + kvcache fix fix(kvcache): trace-faithful PositionalKVCache.update (#763) #193). Test-merge into current develop is clean — libs.versions.toml auto-merges, no conflicts.
Re-run ./gradlew apiCheck after the rebase/retarget in case develop's kvcache change shifted the public API surface vs. the dumps generated here.

Stack (merge order)

feat(llama): NATIVE_OPTIMIZED packed weight path (mirror Gemma) #195 (packed weights)
perf(mha)+fix(rope): fused decode-attention & traceable interleaved RoPE #196 (fused attention + RoPE)
this PR (release 0.32.0)

Merge with a merge-commit / rebase (do NOT squash).

Ships the real-GGUF Llama eager path (packed NATIVE_OPTIMIZED) and unblocks StableHLO/IREE export for Llama-family models (traceable interleaved RoPE). Against engine 0.32.0. - VERSION_NAME 0.31.1 -> 0.32.0; engine pin skainet 0.31.0 -> 0.32.0. - CHANGELOG: [0.32.0] section (NATIVE_OPTIMIZED Llama / fused decode-attention / traceable RoPE / packed-embedding gather fix). - README: Current release + "What's new in 0.32.0". - antora samples: BOM coordinate 0.31.1 -> 0.32.0 (getting-started-java, llama3-tool-calling). Features (since 0.31.1): ccbd87e NATIVE_OPTIMIZED Llama, 3791f88 fused decode-attention, 019b049 traceable interleaved RoPE. Verified: transformer-core + llm-inference:llama compile against published engine 0.32.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

apiCheck was red: llm-core.api still listed the NN primitives that the 0.31.1 transformer-core extraction moved out, and transformer-core had no API validation at all — so those public types were tracked nowhere. - Add binary-compatibility-validator to transformer-core; apiDump creates transformer-core/api/jvm/transformer-core.api (the 30 moved primitives: attention, KV-cache family, embedding, norms, RoPE, FFNs, linear projection). - Regenerate llm-core.api (drop the moved primitives — they're now tracked in transformer-core, not lost). - llama.api: + convertLlamaWeightsPacked (the 0.32.0 NATIVE_OPTIMIZED feature). apiCheck now passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

michalharakal and others added 2 commits June 25, 2026 10:57

michalharakal changed the base branch from perf/fused-decode-attention to develop June 25, 2026 13:32

michalharakal merged commit 217c1cb into develop Jun 25, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

release: SKaiNET-transformers 0.32.0#197

release: SKaiNET-transformers 0.32.0#197
michalharakal merged 2 commits into
developfrom
release/0.32.0

michalharakal commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

michalharakal commented Jun 25, 2026

PR 3 of 3 — IREE/TinyLlama fix-stack (merge in order)

Commits

Notes

Stack (merge order)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant