Move model-specific PTQ overrides from llm_ptq to YAML recipes by shengliangxu · Pull Request #1506 · NVIDIA/Model-Optimizer

shengliangxu · 2026-05-16T00:18:12Z

What does this PR do?

Type of change: new feature

Replaces the hardcoded model-type branches in examples/llm_ptq/ with opt-in declarative recipes under modelopt_recipes/huggingface/<model_type>/ptq/. Users who want the model-specific tweaks now pass --recipe huggingface/<model_type>/ptq/<recipe>; users on the plain --qformat path get the generic numerics.

What moved out of Python (examples/llm_ptq/example_utils.py::build_quant_cfg and examples/llm_ptq/hf_ptq.py::mono_quantize):

gemma / mpt w4a8_awq → awq_lite with alpha_step=1 (coarser search to avoid TRT-LLM overflow).
gemma int8_sq → SmoothQuant alpha=0.5 (default 1.0 regresses Gemma 7B).
phi4mm → disable *speech*, *audio*, *image*, *vision* (quantize only the language model).
Nemotron VL → disable *vision*, *image*, *radio*, *visual*, *encoder*, *model_encoder* (quantize only the decoder).

What stayed in Python:

MTP dynamic layer exclusion in hf_ptq.py (depends on runtime-detected layer indices).
is_nemotron_vl(full_model) detection itself, which still drives the VLM calibration loop and the post-quantize full_model update — only the quant_cfg tweak it triggered was migrated.

Recipe layout (modelopt_recipes/huggingface/):

gemma/ptq/{w4a8_awq,int8_sq}-kv_fp8_cast.yaml
mpt/ptq/w4a8_awq-kv_fp8_cast.yaml
phi4mm/ptq/{disabled_quantizers,nvfp4-kv_fp8_cast}.yaml
nemotron_vl/ptq/{disabled_quantizers,nvfp4-kv_fp8_cast}.yaml

All recipes ship with FP8 KV-cache cast (kv_fp8_cast). For phi4mm and nemotron_vl, disabled_quantizers.yaml is a merged unit that includes the standard default_disabled_quantizers exclusions plus the model-specific ones, so each recipe imports a single disabled-quantizer slot instead of layering two. Each ptq/ folder has a README.md describing exactly what is model-specific.

Usage

# Gemma W4A8 AWQ with the Gemma-specific algorithm tuning + FP8 KV cache:
python examples/llm_ptq/hf_ptq.py \
  --pyt_ckpt_path google/gemma-7b \
  --recipe huggingface/gemma/ptq/w4a8_awq-kv_fp8_cast \
  --export_path ./out

# Nemotron VL with vision branches excluded automatically:
python examples/llm_ptq/hf_ptq.py \
  --pyt_ckpt_path nvidia/<nemotron-vl-model> \
  --recipe huggingface/nemotron_vl/ptq/nvfp4-kv_fp8_cast \
  --export_path ./out

Testing

Pre-commit recipe validator (tools/precommit/check_modelopt_recipes.py) loads every new recipe via load_recipe() — passes for all 7 new YAMLs.
yamlfmt + markdownlint + bandit + license-insertion hooks all pass.
No tests reference the removed build_quant_cfg(qformat, ..., model_type, ...) signature; the only call sites (hf_ptq.py, multinode_ptq.py) were updated to the new 2/3-arg form.

Before your PR is "Ready for review"

Is this change backward compatible?: ❌ — users who relied on automatic model-specific quant_cfg tweaks via --qformat (gemma/mpt AWQ, gemma SmoothQuant, phi4mm exclusions, Nemotron VL exclusions) now need to pass --recipe huggingface/<model_type>/ptq/<recipe> to get them. The flag itself is unchanged; only the implicit behavior was removed.
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ❌ — relies on the existing pre-commit recipe validator that loads each new YAML.
Did you update Changelog?: ❌ — please flag if needed.
Did you get Claude approval on this PR?: ❌

Additional Information

Merging into shengliangx/all-yaml-configs. Built on top of fc2fd4ad3 ("set paths stright"); that commit only moves Step3.5-Flash into huggingface/step3p5/ and adds the huggingface/README.md, so the migration commit is the substantive change.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Replace the hardcoded model-type branches in examples/llm_ptq (gemma/mpt AWQ alpha tuning, gemma SmoothQuant alpha, phi4mm exclusions, Nemotron VL exclusions) with opt-in declarative recipes under modelopt_recipes/huggingface/<model_type>/ptq/. Users select them with --recipe huggingface/<model_type>/ptq/<recipe>. - Per-model recipes ship with FP8 KV-cache cast (kv_fp8_cast) and the algorithm/numerics each model needs. - phi4mm and nemotron_vl each include a merged disabled_quantizers.yaml unit so recipes import a single disabled-quantizer slot instead of layering default + model-specific exclusions. - Each ptq/ folder has a README describing what is model-specific. - Drop now-unused qformat/model_type parameters from build_quant_cfg and the Nemotron VL append block in mono_quantize. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

copy-pr-bot · 2026-05-16T00:18:15Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-05-16T00:18:17Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a59d9525-2b9a-4143-8fa9-daafc096cac3

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch shengliangx/model-specific-ptq-recipes

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

shengliangxu added 2 commits May 15, 2026 13:27

set paths stright

fc2fd4a

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move model-specific PTQ overrides from llm_ptq to YAML recipes#1506

Move model-specific PTQ overrides from llm_ptq to YAML recipes#1506
shengliangxu wants to merge 2 commits into
shengliangx/all-yaml-configsfrom
shengliangx/model-specific-ptq-recipes

shengliangxu commented May 16, 2026

Uh oh!

copy-pr-bot Bot commented May 16, 2026

Uh oh!

coderabbitai Bot commented May 16, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shengliangxu commented May 16, 2026

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented May 16, 2026

Uh oh!

coderabbitai Bot commented May 16, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant