Add Qwen3VL MCore Export support from PR 895#1482
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds bidirectional Megatron Core ↔ Hugging Face weight mappings for Qwen3-VL, registers them as a plugin, includes tests validating import/export symmetry and prefix rules, and updates changelog and deployment docs for Qwen 3‑VL support. ChangesQwen3-VL Megatron Core Integration
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CHANGELOG.rst`:
- Line 136: Move the changelog entry "Add Megatron Core export/import mapping
for Qwen3-VL (``Qwen3VLForConditionalGeneration``) vision-language models..."
out of the 0.42 (2026-03-10) section and place it under the current
unreleased/0.45 section header in CHANGELOG.rst, preserving the existing
formatting and inline code markup; ensure you remove the duplicate from 0.42 and
verify the entry appears exactly once under the 0.45 (unreleased) section.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 7f30ba6d-8de4-4386-b197-7f8189e56d24
📒 Files selected for processing (5)
CHANGELOG.rstdocs/source/deployment/3_unified_hf.rstmodelopt/torch/export/plugins/mcore_common.pymodelopt/torch/export/plugins/mcore_qwen3vl.pytests/gpu_megatron/torch/export/test_mcore_qwen3vl.py
|
|
/claude review |
Claude Review SummarySmall, additive PR that clones the Qwen3 mcore mapping with Findings
Most impactful
Risk: Low-to-moderate. Code is purely additive, no existing arch behavior changes. Worst case is a broken Qwen3-VL round-trip that only manifests at runtime — which is exactly why the test placement matters. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1482 +/- ##
==========================================
- Coverage 77.44% 76.76% -0.68%
==========================================
Files 473 474 +1
Lines 51418 51445 +27
==========================================
- Hits 39819 39493 -326
- Misses 11599 11952 +353
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add Megatron Core export/import mapping for Qwen3-VL (Qwen3VLForConditionalGeneration). Handles the model.language_model. weight prefix and supports both dense and MoE variants. Signed-off-by: Hung-Yueh <hungyuehc@nvidia.com> mv test_mcore_qwen3vl.py to tests/gpu_megatron/torch/export/ Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
2423ae7 to
a7d1170
Compare
Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
|
/claude review |
Claude Review SummaryFindings: CRITICAL: 0 · IMPORTANT: 0 · SUGGESTION: 2 The change is small and additive: it registers a new HF→MCore mapping for Notes
RiskLow. The change is isolated to an additive plugin entry and a new file; the round-trip has been validated by the author against |
Replace the hand-written dict literals in mcore_qwen3vl.py with a helper that derives the VL mapping from qwen3_causal_lm_import/export by inserting 'language_model.' after 'model.' in every prefix. lm_head. (root-level) is left unchanged. Remove TestQwen3VLvsQwen3Difference since it now tests the implementation against itself. Note visual encoder (model.visual.*) is intentionally excluded from the mapping. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
e520fe3 to
80495e6
Compare
There was a problem hiding this comment.
These are tests for very low level stuff. Can we instead just define a toy qwen3_vl HF model; import into Mcore and quantize with FP8 and export it so we can test this e2e?
You can define the toy model in https://github.com/NVIDIA/Model-Optimizer/blob/main/tests/_test_utils/torch/transformers_models.py
and refer to other megatron llm export tests how they are written for reference
This PR is duplicated from PR #895. Since the original branch source is not available now, we create a new branch where we can update this PR.
What does this PR do?
new feature:
Overview: Add Qwen3-VL (Vision-Language) model support to the Megatron Core export/import
plugin, enabling HuggingFace-to-mcore weight conversion for PTQ/QAT/QAD workflows
Details
Qwen3-VL has a different weight structure from Qwen3 text-only models:
This PR adds:
Qwen3VLForConditionalGeneration and Megatron Core, handling the language_model prefix for
all decoder layers, QKV merging/slicing, gated MLP merging/slicing, Q/K layer norms.
all_mcore_hf_export_mapping and all_mcore_hf_import_mapping.
Usage
From the comment:
Create
Megatron-LM/examples/post_training/modelopt/conf/Qwen/Qwen3-VL-8B-Instruct.sh:Then, import Qwen3-VL from HuggingFace to MCore:
Testing
covering:
prefix, lm_head. at root, QKVMerging, GatedMLPMerging, REPLICATE
for layernorms, TP sharding configs
parallel_config
prefixes
language_model. prefix, lm_head unchanged
Before your PR is "Ready for review"
tests/gpu_megatron/torch/export/test_mcore_qwen3vl.pydocs/source/deployment/3_unified_hf.rstCHANGELOG.rstAdditional Information
Companion Megatron-LM PR adds Qwen3VLModel, Qwen3VLDataset, and pretrain_qwenvl.py. Please see this PR NVIDIA/Megatron-LM#3444
Summary by CodeRabbit
New Features
Documentation
Tests