Replicate kv transform code with changes and comment addressal from # 1037#1069
Open
quic-dhirajku wants to merge 2 commits into
Open
Replicate kv transform code with changes and comment addressal from # 1037#1069quic-dhirajku wants to merge 2 commits into
quic-dhirajku wants to merge 2 commits into
Conversation
4f96cbe to
78aaf77
Compare
…d helper utilities This squashed commit combines the last 9 commits on this branch into one coherent change set for RepeatKV support and follow-up review updates. What changed - Added RepeatKV transform support for both LLM and VLM export paths. - Extended model/config handling to support generic attention head naming patterns (e.g., num_attention_heads, n_heads, n_head) to improve cross-model compatibility. - Added/updated RepeatKV operations and integration paths for DeepSeekV3 flows. - Added support for deriving an effective repeat_kv count based on model topology and number of devices. - Added RepeatKV handling for AWQ quantized models. - Improved wrapper-aware behavior for VLM encoder/decoder paths and prevented repeated application of ReplicateKVTransform across wrappers. - Updated CI-related model mapping/test infra for repeat_kv checks across CausalLM/VLM scenarios and adjusted script flow around APIRunner/input-shape sequencing. - Refactored KV duplication logic into shared helpers: - moved helper logic to transformers/models/repeat_kv_utils.py - centralized projection lookup, MLA checks, idempotency checks, and KV duplication - replaced in-class duplication code with utility-driven calls - Applied naming cleanup by renaming num_kv_heads_repeat to num_replicate_kv_heads. Review-driven updates - Included internal and PR review feedback updates across transform behavior, scripts, naming, and helper factoring. - Incorporated a revert+follow-up sequence from review iteration, keeping only the final intended behavior in this squashed result. Notes - Historical TODOs from intermediate commits were retained as context during development; this squashed state reflects the final net code on branch tip. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
78aaf77 to
ab8813b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Raising this PR with all the changes, due to rebase conflicts in the previous branch.