feat(utils): pack=True calibration mode for get_dataset_dataloader#1501
feat(utils): pack=True calibration mode for get_dataset_dataloader#1501kevalmorabia97 wants to merge 10 commits into
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a packing option to dataloader producing EOS-separated fixed-length token chunks, introduces preferred fused LayerNorm mapping keys and importer fallbacks, conditionally registers HybridModel for NAS/pruning, updates MCore pruning hybrid detection/defaults, and tweaks generation and MMLU utilities. Changes0.45 Release Updates
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
/ok to test df7ab63 |
|
Two WARs for modelopt <= 0.44 (fixed upstream in NVIDIA/Model-Optimizer#1501): - `prune.py`: after `import_mcore_gpt_from_hf` returns, walk the model and copy `model.layers.{i}.input_layernorm.weight` and `model.layers.{i}.post_attention_layernorm.weight` from HF into the fused `TELayerNormColumnParallelLinear.layer_norm_weight` parameters on `attention.linear_qkv` and `mlp.linear_fc1`. Without this the fused LayerNorm weights stay at random init for GPT-family models (Qwen3, Llama, ...) since modelopt 0.44's importer only loads `fused_norm` for Nemotron-H, leaving post-prune MMLU at chance. The WAR fails soft on missing HF keys, so it is a no-op on Nemotron-H (which uses `backbone.layers.{i}.norm.weight`). - `mmlu.py`: load `modelopt.torch.utils.plugins.megatron_generate` via `importlib.import_module` to grab the submodule rather than the function the package re-exports under the same name. The previous `from ... import megatron_generate as _mg_plugin` form raised `AttributeError: 'function' object has no attribute 'broadcast_from_last_pipeline_stage'` at import time. Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1501 +/- ##
==========================================
- Coverage 76.92% 67.32% -9.60%
==========================================
Files 473 474 +1
Lines 51439 52625 +1186
==========================================
- Hits 39567 35428 -4139
- Misses 11872 17197 +5325
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
`get_dataset_dataloader` tokenizes each calibration sample individually with `padding=True, truncation=True, max_length=512`. For long-document datasets like cnn_dailymail (typical article: ~700-1000 tokens), that truncates most of each article and pads short ones — feeding the importance estimator a heavily padded, contextually-impoverished batch. Pack samples into uniform `calib_max_sequence_length` chunks from the concatenated token stream (with `eos_token_id` as document separator) the way Megatron-Bridge's calibration loop does. This exposes the model to many more distinct contexts per `calib_size` samples and eliminates padding-token contamination of activation statistics. Measured impact on Qwen3-8B pruned to 30L/3584/11776 (5.99B params): before (trunc+pad): MMLU 0.486 after (packed): MMLU 0.544 (+5.8 pts, M-Bridge ref 0.563) The proper upstream fix is to add a `pack` mode to `get_dataset_dataloader` in modelopt (NVIDIA/Model-Optimizer#1501); this inline change makes prune.py work today against released modelopt 0.44.0. Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
/ok to test 20d3c5b |
20d3c5b to
8b537ce
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modelopt/torch/prune/plugins/mcore_minitron.py`:
- Around line 59-60: The current enablement enables HybridModel defaults but
pattern updates and metric accounting are still gated by isinstance(...,
MambaModel), causing plain HybridModel instances to skip hybrid-pattern logic;
update the checks in the functions that perform hybrid pattern updates and
candidate metric accounting (the places currently using isinstance(obj,
MambaModel)) to detect HybridModel instead (e.g., isinstance(obj, HybridModel))
so plain HybridModel instances get the same pattern handling and metric updates,
and if any MambaModel-specific behavior is required keep a secondary
isinstance(obj, MambaModel) branch for those special cases.
In `@modelopt/torch/utils/dataset_utils.py`:
- Around line 492-526: The code can silently produce zero-length output when
token_stream is too short: after computing n_chunks from token_stream, check for
the edge cases and fail fast or warn; specifically, in the packing branch after
computing n_chunks (and before building input_ids/batch_encoded) add logic that
raises a clear exception if n_chunks == 0 (mentioning tokenizer.encode,
token_stream, total_chunks, and max_sample_length) and log/warn when n_chunks <
total_chunks to inform the caller they received fewer chunks than requested;
ensure the exception/warning uses the existing logging mechanism or raises a
ValueError so callers cannot proceed with empty tensors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d3ed2625-6222-473c-80cd-32fb4d6fbd4c
📒 Files selected for processing (9)
CHANGELOG.rstmodelopt/torch/export/plugins/mcore_qwen.pymodelopt/torch/export/plugins/megatron_importer.pymodelopt/torch/nas/plugins/megatron.pymodelopt/torch/prune/plugins/mcore_minitron.pymodelopt/torch/utils/dataset_utils.pymodelopt/torch/utils/plugins/megatron_generate.pymodelopt/torch/utils/plugins/megatron_mmlu.pytests/unit/torch/utils/test_dataset_utils.py
| device: torch.device | None = None, | ||
| include_labels: bool = False, | ||
| apply_chat_template: bool = False, | ||
| pack: bool = False, |
There was a problem hiding this comment.
@cjluo-nv @ChenhanYu should we make this default? Will quantization results see any improvement with packing? I am seeing big improvement in M-LM pruning accuracy if I enable packing
|
/claude review |
Claude review summaryFindings: CRITICAL: 1, IMPORTANT: 2, SUGGESTION: 2 Most impactful
Strengths
Risk assessmentMedium. The Qwen3 / Nemotron-H prune paths the PR explicitly validated end-to-end look solid, and the calibration-packing primitive is well-isolated behind an opt-in flag. The pre-existing |
Claude review summaryFindings: CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 3 Traced the algorithmic and structural changes end-to-end:
Risk: low. All three suggestions are clarity / forward-compat nudges, not blockers. The importer's switch from silent-pass to KeyError on missing fused-norm rules is a deliberate breaking change for any out-of-tree plugin that uses fused TE without registering a norm rule, but the PR description owns this and the legacy LGTM aside from the inline notes. |
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Bundle of independent Minitron-pruning fixes for Megatron-LM compat: HybridModel registration, fused-TE-spec import/export rules per-context (Qwen3/Llama/DeepSeek/GptOss), in_features dynamic-attr tracking, .contiguous() before pp broadcast, pack=True calibration packing, and a few quality-of-life tweaks (mmlu_dataset kwarg, warn_rank_0 stacklevel auto-bump). Each change is well-explained and small. Spot checks:
- The
HybridModelregistration relies on_DMRegistryCls._get_registered_nn_classreturning the first match inself._registryinsertion order, withMambaModelregistered beforeHybridModelviaSUPPORTED_MODELSdict insertion order. Verified — Mamba dispatch is preserved if/when MambaModel still inherits HybridModel's forward. _get_fused_norm_weightreturns a 2-tuple now; all call sites updated correctly. Mamba path explicitly sticks with the legacyfused_normkey (correct: Nemotron-H still ships a single shared norm slot).warn_rank_0stacklevel default change (1→2) correctly points warnings at user call sites; existing callers passing explicit stacklevel get bumped consistently._build_packed_input_ids2x oversample with chunk-bounded tokenization is reasonable;add_special_tokens=Falserationale is documented including the no-BOS caveat..contiguous()beforebroadcast_from_last_pipeline_stageis correct for SP-padded slices.
Why nudge instead of approve:
- Design-review gate fired (9 dirs). Apart from the
packfeature there's no architectural change — the rest extends existing systems. But: - The pruning-side fixes have no unit tests in this repo; they're integration-validated only by the linked Megatron-LM PR #4807 and the Qwen3-8B launcher's MMLU bound bump (0.68→0.75). For HybridModel and fused-TE-spec, that's fine but worth a human eyeballing the consumer wiring.
- PR body claims "
MambaModel.forward is HybridModel.forwardis asserted at import" — I don't see such an assert in the diff. Either the assert was dropped, or the description is stale. Worth confirming this is intentional, since the comment inmegatron.pycalls the inheritance "load-bearing". pack=Falsenow emits aFutureWarningon every call. Intentional ("will become default in a future release"), but may surface as test-suite churn for downstream users.
PR size 395/-120 is on the higher side for a "fixes" PR but the changes are individually small and cohesive across the Minitron pipeline.
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
/claude review |
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Re-review of #1501. All previously-flagged critical/important comments have been addressed in the latest diff:
Addressed (critical):
- HybridModel pruning pattern stale-string bug (
isinstance(MambaModel)gates at lines 372/689) — resolved via new_get_hybrid_pattern_key()helper usinghasattrdetection on bothhybrid_override_patternandhybrid_layer_pattern. Decoupled from class hierarchy as suggested. pack=Truezero-chunk silent-empty-tensor footgun — now raisesValueErroronn_chunks == 0and emitswarn_rank_0onn_chunks < total_chunks.- Fused-TE-spec importer demotion to warning — now raises
KeyErrorfor unambiguous misconfig (matches what previous review explicitly requested). - Cross-arch coverage gap —
fused_input_layernorm/fused_pre_mlp_layernormrules added tomcore_llama.py,mcore_deepseek.py,mcore_gptoss.py,mcore_llama4,qwen25_causal_lm_import(not just qwen3 as in the original PR). - Importer/exporter asymmetry on fused-norm fallback —
unified_export_megatron._get_fused_norm_weightnow takes aprimary_keyarg with the same fallback chain, called per-context (fused_input_layernormfor attn,fused_pre_mlp_layernormfor MLP,fused_normfor Mamba). _extra_statezeroing in qkv-merging — now preservesmodule_state_dict.get("_extra_state")instead of unconditionallyNone.
Author-declined with sound reasoning (visible in inline replies): import-time MambaModel.forward is HybridModel.forward assert (would break consumers on plausible MCore changes; dispatch is correct either way per _get_registered_nn_class's forward is forward filter), hard-fail on eos_token_id is None, setdefault semantics for warn_rank_0 stacklevel.
Why nudge instead of approve:
- PR body still claims
MambaModel.forward is HybridModel.forwardis "asserted at import" — that assertion is not present (intentionally, per author's reply). Description should be updated for the next reader. - CHANGELOG silently changes 0.44 release date
2026-05-18 → 2026-05-14— outside the advertised PR scope; needs owner confirmation that this is intentional and not a merge artifact. - Pruning-side fixes (HybridModel registration, in_features dyn-attr,
.contiguous()) are integration-validated only via consumer Megatron-LM PR #4807 and the Qwen3-8B launcher's MMLU bound bump (0.68→0.75); no in-repo unit tests added for the pruning paths. mmlu_lower_boundtightening to 0.75 turns CI red on routine numerical drift if post-PTQ Qwen3-8B doesn't clear with margin; only the prune-time MMLU (0.544) is documented in the PR body.- Bundle-of-fixes PR spanning 9 directories (design-review gate fired); the only architectural addition is
pack=True, which is well-isolated and uses no new abstractions, but a human reviewer with cross-cutting MCore context should still sign off.
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Re-review: all previously-flagged critical issues are addressed in the latest diff.
Addressed (critical):
- HybridModel pruning:
_get_hybrid_pattern_key()helper useshasattrdetection on bothhybrid_override_patternandhybrid_layer_pattern; replaces bothisinstance(MambaModel)gates (lines ~372 and ~689). Decoupled from class hierarchy. - HybridModel dynamic-conversion registration: HAS_HYBRID flag in
nas/plugins/megatron.py; HybridModel registered in SUPPORTED_MODELS after MambaModel (insertion order + DMRegistry'sforward is forwardfilter make dispatch unambiguous either way). pack=Truezero-chunk footgun: raisesValueError;n_chunks < total_chunksemitswarn_rank_0.- Fused-TE-spec importer:
KeyError(not warn) when neitherfused_input_layernorm/fused_pre_mlp_layernormnor legacyfused_normis registered — matches what the previous review explicitly requested. - Cross-arch coverage: fused-norm rules added to
mcore_qwen.py,mcore_llama.py(incl. llama4),mcore_deepseek.py,mcore_gptoss.py, qwen25_causal_lm_import. - Importer/exporter fused-norm symmetry:
_get_fused_norm_weighttakesprimary_keyarg with the same fallback chain; called per-context (attn / MLP / Mamba). _extra_statepreserved in qkv-merging instead of unconditionallyNone.in_featuresregistered as a dynamic attribute on_DynamicTEQKVLayerNormColumnParallelLinear..contiguous()on logits slice beforebroadcast_from_last_pipeline_stage.
Per operator instructions, treated as resolved:
- PR body's stale
MambaModel.forward is HybridModel.forwardassertion claim has been removed; the current PR body correctly states dispatch is robust either way. - CHANGELOG 0.44 release-date shift
2026-05-18 → 2026-05-14is intentional. - mmlu_lower_bound 0.68 → 0.75 tightening is appropriate (previous bound was too loose to catch regressions).
- Pruning-side fixes are integration-validated via Megatron-LM PR #4807, which has been thoroughly tested out-of-tree.
Design review (9-dir gate): Only architectural addition is pack=True, which is a feature flag on an existing function — no new subsystem, abstraction, DSL, or registry. Implementation lives in one helper (_build_packed_input_ids). No existing in-repo system or already-imported dependency would simplify this; it's a tokenization-flow change. Design protocol satisfied.
Complex PR: spans 9 directories (≥ 5). Looping in a human for approval.
Claude Review SummaryFindings: CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 4 Overall assessmentLow-risk, well-considered PR. The algorithmic fixes are sound and the back-compat story is explicit:
Suggestions posted inline
LGTM modulo the suggestions above. |
- _get_hybrid_pattern_key: check `getattr(model, attr, None) is not None` instead of just `hasattr`, since HybridModel subclasses could initialize the attribute to None and either call site (string-format / iterate) would then break. - _FakeTokenizer: match HF padding=True semantic (pad to longest in batch, optionally bounded by max_length) instead of always padding to max_length. Keeps the test fake faithful to the production tokenizer surface. Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
/claude review Previous review suggestions have been addressed in commits
Please re-evaluate the latest commit ( |
Claude review summaryFindings: CRITICAL: 0, IMPORTANT: 1, SUGGESTION: 2. The PR is a tightly scoped set of pruning fixes plus one well-defined feature, each independently motivated and end-to-end validated (Qwen3-8B MMLU 0.486 → 0.544 with Most impactful findingImport/export key asymmetry for fused TE spec ( Other
RiskLow. The Minitron-side fixes are loud-by-design (fused-norm KeyError, contiguous broadcast) and the calibration packing path is fully gated behind |
- Export mappings: add fused_input_layernorm / fused_pre_mlp_layernorm to mcore_qwen (qwen3, qwen25), mcore_llama (llama, llama4), mcore_deepseek, mcore_gptoss export dicts so fused-TE-spec models round-trip cleanly. Llama4 and GptOss only get fused_input_layernorm (MoE-only on MLP side, fused_pre_mlp_layernorm path is unreachable). - dataset_utils: docstring notes (a) chat-template + pack=True can double the EOS at doc boundaries (harmless for calibration stats but worth flagging), (b) tokenize_batch_size formula's intent and floor. - mcore_minitron: tighten _get_hybrid_pattern_key to also reject empty patterns via Python truthiness (rejects None, "", [] in one check). Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
/claude review Round 11 addressed in commit
Request for this review round: please do a single consolidated pass against |
There was a problem hiding this comment.
Claude review summary — overall low risk, LGTM with three minor SUGGESTION-level notes posted inline.
Findings
- CRITICAL: 0
- IMPORTANT: 0
- SUGGESTION: 3 (pack/include_labels interaction, explicit
padding=Falseon tokenizer call, attribute-presence vs isinstance for hybrid detection)
Trace coverage
I traced each of the four threads end-to-end:
-
Calibration packing —
_build_packed_input_idscorrectly preserves per-source ratio, early-exits the 2x oversample, and falls back gracefully wheneos_token_idis missing. The non-pack code path is unchanged so existing call sites are unaffected. Test parametrization keeps both branches covered. Docstrings address the BOS-absence andapply_chat_template<eos><eos>edge cases honestly. -
HybridModel registration — Insertion-order argument for
DMRegistry._get_registered_nn_classis sound: GPTModel → MambaModel (if HAS_MAMBA) → HybridModel (if HAS_HYBRID). For a MambaModel instance,issubclass(MambaModel, HybridModel)returning True doesn't matter because the iteration hitsMambaModelfirst. For a plain HybridModel instance, MambaModel is correctly skipped (it's the child, soissubclass(HybridModel, MambaModel)is False)._HYBRID_DIVISORSextraction is a clean refactor. -
Importer fused-norm dispatch — Per-context-key + legacy
fused_normfallback is implemented symmetrically on import (megatron_importer.py) and export (unified_export_megatron.py). The KeyError-instead-of-warn change is appropriate — the branch only fires when the model uses fused TELayerNormColumnParallelLinear, so a missing rule would otherwise ship a chance-accuracy checkpoint silently. Mamba export still uses the defaultprimary_key="fused_norm", preserving Nemotron-H behavior. The new_extra_statepreservation in_gated_mlp_mergingand the change fromNonetomodule_state_dict.get("_extra_state")in_qkv_mergingcorrectly avoid blowing away TE FP8 bookkeeping. -
Misc —
.contiguous()fix on themegatron_prefilllogits slice is correct (broadcast asserts contiguity, SP padding breaks the stride). Thein_featuresdynamic attribute on_DynamicTEQKVLayerNormColumnParallelLinearis the right fix for TE's forward-timeinp_shape[-1] == in_featuresassertion under hidden_size pruning.warn_rank_0stacklevel auto-bump is back-compatible (no in-tree caller passesstacklevel).mmlu_lower_bound0.68→0.75 is justified by the fused-norm import fix.
Backward compatibility: pack defaults to False, all importer/exporter rule additions are net-new keys, and the mmlu_dataset arg has a back-compat default. No modelopt_state schema changes.
Risk level: low. The fused-norm + HybridModel changes are corrective fixes to paths that were producing broken checkpoints, validated by the PR author's MMLU numbers. Suggest landing.
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Re-review focused on the operator's question about attention-mask correctness under sequential packing, plus the design-review gate.
Attention mask in packed mode (get_dataset_dataloader(pack=True)):
_build_packed_input_idsreturns onlyinput_ids; the dataloader setsattention_mask = torch.ones_like(input_ids)(dataset_utils.py around theif pack:branch).- This is consistent with the "no padding in packed chunks" contract (every position is a real token), but it is a naive pack: no block-diagonal / per-document reset mask. Attention crosses document boundaries (separated only by
eos_token_idas a token, not as a mask boundary). For amax/sensitivity calibration this matches industry practice and the docstring is honest about the BOS/eos caveats — but a reader expecting reset-on-eos masking will be surprised, and the docstring doesn't say the mask is unconditionally all-ones.
Test coverage of the mask:
- Exactly one assertion exercises the packed attention mask: in
TestGetDatasetDataloaderBlending.test_single_jsonl, thepack=Truebranch assertsbatches[0]["input_ids"].shape == (2, 16)and(batches[0]["attention_mask"] == 1).all(). That covers the "all-ones, no padding" contract. - The blending and HF-tiny tests assert chunk counts but never inspect the mask. A regression that, e.g., zeroed out positions after
eos_token_idwould not be caught outsidetest_single_jsonl.
Re-review status of prior comments: All previously-flagged critical items (HybridModel pruning pattern via _get_hybrid_pattern_key, n_chunks==0 ValueError + warn_rank_0 on shortfall, KeyError on missing fused-norm rule, cross-arch fused-norm coverage in Llama/DeepSeek/GptOss/llama4/qwen25, importer/exporter symmetry via _get_fused_norm_weight(primary_key=...), _extra_state preservation, in_features dynamic attr, .contiguous() before pp broadcast) are addressed in the current diff. Author-declined items (import-time MambaModel.forward is HybridModel.forward assert, hard-fail on missing eos_token_id, setdefault for stacklevel) have explicit, defensible rationale on the PR.
Why nudge instead of approve:
- Design-review gate fired (9 dirs). The architectural addition is just
pack=Trueon an existing function — implementation lives in one helper, no new subsystem/registry/DSL — so the protocol is satisfied. But cross-cutting MCore changes still warrant a human signoff. - Attention-mask semantics under packing should ideally be one of: (a) explicitly documented as "all-ones, no per-doc reset" in the
packdocstring, or (b) an opt-in for boundary-aware masks if/when a downstream consumer needs it. Today's behavior is fine for calibration but undocumented. - Pruning-side fixes (HybridModel, in_features, .contiguous) are integration-validated only via consumer Megatron-LM PR #4807; no in-repo unit tests for the pruning paths.
Don't trust tokenizer subclass __call__ defaults — a custom wrapper that flips the HF default could silently inject pad tokens into the packed stream. Be explicit. Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
- Docstring: note that attention_mask is unconditionally all-ones — attention crosses document boundaries (the eos separator is a token, not a mask boundary). Prevents future readers from assuming pack implies per-doc mask reset. - test_list_of_jsonl_blends: assert (attention_mask == 1).all() and shape parity so a future boundary-aware masking change is forced to update the contract rather than silently passing. Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Split #1501 so this PR keeps only the pack=True calibration dataloader change. The HybridModel pruning, fused-TE-spec import/export, and related fixes are now in #1518 (also targeting main). This commit reverts to main the files that belong to #1518: - modelopt/torch/export/plugins/{mcore_deepseek,mcore_gptoss,mcore_llama,mcore_qwen,megatron_importer,unified_export_megatron}.py - modelopt/torch/nas/plugins/megatron.py - modelopt/torch/prune/plugins/mcore_minitron.py - modelopt/torch/utils/logging.py - modelopt/torch/utils/plugins/{megatron_generate,megatron_mmlu}.py - tools/launcher/examples/Qwen/Qwen3-8B/megatron_lm_ptq.yaml CHANGELOG: drop the Bug Fixes entry and the 0.44 date correction (those go with #1518). Keep the pack=True New Features entry. History is preserved — earlier commits with the dropped changes remain on this branch's log, this commit just rolls the working state back to "pack only". Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Summary
Add
pack: booloption tomodelopt.torch.utils.dataset_utils.get_dataset_dataloader. WhenTrue, raw samples from each source are concatenated into a per-source token stream (separated bytokenizer.eos_token_id) and sliced into uniformmax_sample_lengthchunks, contiguous by source (preserving the requested per-source ratio innum_samples). DefaultFalsefor back-compat; opt in viapack=Truefor cleaner calibration statistics.The non-pack fixes originally bundled here (HybridModel pruning, GPT-family fused-TE-spec import/export,
warn_rank_0stacklevel,mmlu_lower_boundbump, etc.) have been split to #1518.Why
The default
pack=Falsepath tokenizes each sample with truncate-and-pad, which has two problems for amax / sensitivity calibration:pack=Truekeeps 100 % of the budget as real tokens — every chunk is naturally-length context, document boundaries are expliciteosseparators (a token, not a mask boundary), and the per-source ratio innum_samplesis preserved instead of letting one long-document source dominate.Measured on Qwen3-8B Minitron prune to 30L / 3584 / 11776 (cnn_dailymail, 256 samples,
seq_length=512):pack=Falsepack=TrueBehavior details
num_samplessemantics shift between modes (documented in the docstring): withpack=Falseit's the number of raw samples to fetch and tokenize; withpack=Trueit's the number ofmax_sample_length-token chunks to produce per source. Migrating a call site topack=Truemay need a different value to hit the same total-token calibration budget.add_special_tokens=False). Fine for amax / sensitivity calibration where boundary tokens are statistically dominated; callers that need BOS-prefixed sequences should keeppack=False.attention_maskis unconditionally all-ones — attention crosses document boundaries (eos is a token, not a mask boundary). Documented in the docstring; asserted in the blending test so a future boundary-aware-mask change is forced to update the contract.num_samples * 2oversample.ValueErrorwhen no source yields enough tokens for one chunk.tokenizer.eos_token_id is None(documents would concatenate without a separator).create_forward_loopforwardspackso internal PTQ callers can opt in.Consumer
Megatron-LM PR NVIDIA/Megatron-LM#4807 calls
get_dataset_dataloader(..., pack=True)fromprune.py. Once 0.45 ships and the modelopt pin is bumped, the inline WAR shipped against released 0.44 collapses to one kwarg.Related: #1518 (Minitron HybridModel + GPT-family fused-TE-spec import/export, also targeting main).