Skip to content

Replicate kv transform code with changes and comment addressal from # 1037#1069

Open
quic-dhirajku wants to merge 2 commits into
quic:release/v1.22.0_tmpfrom
quic-dhirajku:replicate_kv_for_release
Open

Replicate kv transform code with changes and comment addressal from # 1037#1069
quic-dhirajku wants to merge 2 commits into
quic:release/v1.22.0_tmpfrom
quic-dhirajku:replicate_kv_for_release

Conversation

@quic-dhirajku

Copy link
Copy Markdown
Contributor

Raising this PR with all the changes, due to rebase conflicts in the previous branch.

@quic-dhirajku quic-dhirajku force-pushed the replicate_kv_for_release branch from 4f96cbe to 78aaf77 Compare June 11, 2026 10:32
…d helper utilities

This squashed commit combines the last 9 commits on this branch into one coherent change set for RepeatKV support and follow-up review updates.

What changed
- Added RepeatKV transform support for both LLM and VLM export paths.
- Extended model/config handling to support generic attention head naming patterns
  (e.g., num_attention_heads, n_heads, n_head) to improve cross-model compatibility.
- Added/updated RepeatKV operations and integration paths for DeepSeekV3 flows.
- Added support for deriving an effective repeat_kv count based on model topology and
  number of devices.
- Added RepeatKV handling for AWQ quantized models.
- Improved wrapper-aware behavior for VLM encoder/decoder paths and prevented repeated
  application of ReplicateKVTransform across wrappers.
- Updated CI-related model mapping/test infra for repeat_kv checks across CausalLM/VLM
  scenarios and adjusted script flow around APIRunner/input-shape sequencing.
- Refactored KV duplication logic into shared helpers:
  - moved helper logic to transformers/models/repeat_kv_utils.py
  - centralized projection lookup, MLA checks, idempotency checks, and KV duplication
  - replaced in-class duplication code with utility-driven calls
- Applied naming cleanup by renaming num_kv_heads_repeat to num_replicate_kv_heads.

Review-driven updates
- Included internal and PR review feedback updates across transform behavior, scripts,
  naming, and helper factoring.
- Incorporated a revert+follow-up sequence from review iteration, keeping only the final
  intended behavior in this squashed result.

Notes
- Historical TODOs from intermediate commits were retained as context during development;
  this squashed state reflects the final net code on branch tip.

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
@quic-dhirajku quic-dhirajku force-pushed the replicate_kv_for_release branch from 78aaf77 to ab8813b Compare June 11, 2026 10:33
@quic-dhirajku quic-dhirajku marked this pull request as ready for review June 11, 2026 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant