Schematize config loading and quantizer config entries#1405
Schematize config loading and quantizer config entries#1405shengliangxu wants to merge 24 commits into
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR modernizes the quantization configuration system by converting TypedDict schemas to Pydantic models, implementing type-safe config loading with schema-backed overloads, and broadening type checks from ChangesQuantization Configuration Schema Modernization
🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1405 +/- ##
==========================================
+ Coverage 76.78% 76.88% +0.10%
==========================================
Files 473 473
Lines 51413 51592 +179
==========================================
+ Hits 39476 39669 +193
+ Misses 11937 11923 -14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
066c2c8 to
8e8153a
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
modelopt/torch/quantization/config.py (1)
1093-1114:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPreserve list-valued
cfgfor legacynn.*-scoped entries.This branch unconditionally does
dict(sub_cfg)for non-typed sub-configs, which blows up on sequential configs like{"nn.Linear": {"*weight_quantizer": [{...}, {...}]}}. The non-scoped legacy path still accepts list-valued cfgs, so this regresses class-scoped legacy configs only.Suggested fix
for q_path, sub_cfg in value.items(): if isinstance(sub_cfg, QuantizerAttributeConfig): enable = None cfg = sub_cfg - else: + elif isinstance(sub_cfg, Mapping): sub_cfg = dict(sub_cfg) enable = sub_cfg.pop("enable", None) cfg = sub_cfg or None + else: + enable = None + cfg = sub_cfg entry: dict[str, Any] = { "parent_class": key, "quantizer_name": q_path, "cfg": cfg, }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/torch/quantization/config.py` around lines 1093 - 1114, The nn.* branch currently does unguarded dict(sub_cfg) which breaks when a sub_cfg is a list (legacy sequential configs); in the loop inside the key.startswith("nn.") branch (variables: q_path, sub_cfg, entries, entry, parent_class, quantizer_name, cfg, enable, QuantizerAttributeConfig) change the logic so you only coerce sub_cfg to dict when it's a Mapping (e.g., isinstance(sub_cfg, Mapping)); for non-Mapping values (not a QuantizerAttributeConfig and not a Mapping) preserve sub_cfg as-is (so list-valued cfgs remain lists), then extract enable only from Mapping-based dicts and set cfg accordingly before building and appending each entry returned by this branch.modelopt/torch/quantization/nn/modules/tensor_quantizer.py (1)
1431-1437:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate attribute-list length before applying sequential configs.
At Line 1435,
zip(attributes, self)silently drops extras and leaves trailing quantizers unchanged when lengths differ. This should fail fast to prevent partial/implicit config application.Proposed fix
if not isinstance(attributes, (list, tuple)): assert isinstance(attributes, Mapping), "attributes must be a list or a mapping." attributes = [attributes] * len(self) + elif len(attributes) != len(self): + raise ValueError( + f"Expected {len(self)} attribute configs, but got {len(attributes)}." + ) for attribute, quantizer in zip(attributes, self): quantizer.set_from_attribute_config(attribute)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/torch/quantization/nn/modules/tensor_quantizer.py` around lines 1431 - 1437, When applying a sequence of attribute configs in tensor_quantizer.py, validate that a provided attributes list/tuple has the same length as the quantizer sequence (self) before zipping; if lengths differ raise a clear exception (e.g., ValueError) stating the expected and actual lengths so the call to set_from_attribute_config cannot silently skip trailing quantizers. Specifically, in the method containing the current attributes handling and the loop that calls set_from_attribute_config, add a length check for isinstance(attributes, (list, tuple)) and raise on mismatch.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 252-255: The phi4mm path is reverting typed QuantizeConfig entries
to plain dicts by appending raw dicts to quant_cfg_obj["quant_cfg"]; instead
append properly constructed QuantizerCfgEntry objects so the config remains
fully typed. Locate the code that mutates quant_cfg_obj (the block adding
entries for "*speech*", "*audio*", "*image*", "*vision*") and replace those dict
appends with creation of QuantizerCfgEntry instances (matching the type used
elsewhere) and append those instances; ensure build_quant_cfg() returns a
QuantizeConfig with quant_cfg containing only QuantizerCfgEntry objects on the
phi4mm path.
In `@examples/vllm_serve/vllm_ptq_utils.py`:
- Around line 122-123: The generator expression scanning kv_quant_cfg calls
e.get("quantizer_name") without ensuring e is a mapping; modify the predicate
used where kv_quant_cfg is scanned (the generator/filter that looks for
"quantizer_name" == "*[kv]_bmm_quantizer") to first check the entry type (e.g.,
isinstance(e, dict) or isinstance(e, collections.abc.Mapping) or hasattr(e,
"get")) before calling .get, so malformed/non-mapping entries are skipped and do
not raise an AttributeError.
In `@modelopt/torch/opt/config.py`:
- Around line 120-133: Modify __delitem__ to adhere to MutableMapping semantics
by converting AttributeError into KeyError when resolving the mapping key and
when refusing to unset a field with no default: wrap the call to
get_field_name_from_key(key) so any AttributeError raised is caught and
re-raised as KeyError(key), and replace the final raise AttributeError(...) that
checks for PydanticUndefined with raise KeyError(key) (including a descriptive
message if desired); keep the existing logic for handling model_extra,
model_fields_set, and using field_info.get_default(call_default_factory=True)
and PydanticUndefined.
In `@modelopt/torch/quantization/config.py`:
- Around line 552-574: The validator method validate_quantizer_cfg_entry
currently treats explicit None for the 'cfg' or 'enable' fields as if they were
omitted, allowing entries like {"quantizer_name":"*","enable":None} or
{"quantizer_name":"*","cfg":None} to behave as enabled; change the logic to
reject explicit nulls: if 'cfg' in values and values['cfg'] is None raise
ValueError("cfg must be omitted or a valid mapping/list, not null"), and if
'enable' in values and values['enable'] is None raise ValueError("enable must be
a boolean when provided, not null"); preserve existing behavior for omitted keys
and keep calling cls._validate_enabled_cfg(cfg) only when enable is truthy and
cfg is not None; apply the same null-rejection checks to the analogous validator
block referenced at lines 1157-1197.
In `@modelopt/torch/quantization/conversion.py`:
- Line 252: The code forces legacy dict-shaped quant_cfg through list() before
calling normalize_quant_cfg_list, which converts {"*": {"enable": False}} into
["*"] and breaks backward-compatible handling in normalize_quant_cfg_list and
downstream set_quantizer_by_cfg / set_quantizer_by_cfg_context; fix by removing
the list(...) wrapper and pass quant_cfg directly into normalize_quant_cfg_list
(and apply the same change for the other occurrence around line 499) so the
function can accept both legacy flat dicts and list forms.
In `@modelopt/torch/quantization/utils/core_utils.py`:
- Around line 935-939: The code is appending kv_cache_quant_cfg entries directly
into updated_quant_cfg["quant_cfg"], risking shared mutable state; modify the
concatenation in core_utils.py so you deep-copy each entry from
kv_cache_quant_cfg (or perform a deepcopy of the sequence) before extending
updated_quant_cfg["quant_cfg"] (i.e., when creating inner and assigning
updated_quant_cfg["quant_cfg"] = inner + ...), ensuring entries like
QuantizerCfgEntry instances from kv_cache_quant_cfg are duplicated rather than
referenced.
---
Outside diff comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1093-1114: The nn.* branch currently does unguarded dict(sub_cfg)
which breaks when a sub_cfg is a list (legacy sequential configs); in the loop
inside the key.startswith("nn.") branch (variables: q_path, sub_cfg, entries,
entry, parent_class, quantizer_name, cfg, enable, QuantizerAttributeConfig)
change the logic so you only coerce sub_cfg to dict when it's a Mapping (e.g.,
isinstance(sub_cfg, Mapping)); for non-Mapping values (not a
QuantizerAttributeConfig and not a Mapping) preserve sub_cfg as-is (so
list-valued cfgs remain lists), then extract enable only from Mapping-based
dicts and set cfg accordingly before building and appending each entry returned
by this branch.
In `@modelopt/torch/quantization/nn/modules/tensor_quantizer.py`:
- Around line 1431-1437: When applying a sequence of attribute configs in
tensor_quantizer.py, validate that a provided attributes list/tuple has the same
length as the quantizer sequence (self) before zipping; if lengths differ raise
a clear exception (e.g., ValueError) stating the expected and actual lengths so
the call to set_from_attribute_config cannot silently skip trailing quantizers.
Specifically, in the method containing the current attributes handling and the
loop that calls set_from_attribute_config, add a length check for
isinstance(attributes, (list, tuple)) and raise on mismatch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e15d92dc-470c-4dd5-a1bf-b8509a4179d2
📒 Files selected for processing (25)
docs/source/guides/_quant_cfg.rstexamples/diffusers/quantization/config.pyexamples/diffusers/quantization/quantize.pyexamples/llm_autodeploy/run_auto_quantize.pyexamples/llm_ptq/cast_mxfp4_to_nvfp4.pyexamples/llm_ptq/example_utils.pyexamples/llm_ptq/hf_ptq.pyexamples/llm_ptq/multinode_ptq.pyexamples/vllm_serve/vllm_ptq_utils.pymodelopt/onnx/llm_export_utils/quantization_utils.pymodelopt/recipe/config.pymodelopt/recipe/loader.pymodelopt/torch/opt/config.pymodelopt/torch/opt/config_loader.pymodelopt/torch/quantization/algorithms.pymodelopt/torch/quantization/backends/fp8_per_tensor_gemm.pymodelopt/torch/quantization/backends/nvfp4_gemm.pymodelopt/torch/quantization/config.pymodelopt/torch/quantization/conversion.pymodelopt/torch/quantization/mode.pymodelopt/torch/quantization/model_quant.pymodelopt/torch/quantization/nn/modules/tensor_quantizer.pymodelopt/torch/quantization/utils/core_utils.pytests/unit/recipe/test_loader.pytests/unit/torch/quantization/test_config_validation.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelopt/torch/opt/config.py (1)
60-125:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix
MutableMappingprotocol violation:__getitem__must raiseKeyError, notAttributeError.After
ModeloptBaseConfiginheritsMutableMapping[str, Any]at line 60, missing-key lookups propagateAttributeErrorfromget_field_name_from_key(). This violates the mapping contract, breaking inherited methods likepop()which expectKeyError. Proof: when__getitem__raisesAttributeError,MutableMapping.pop(key, default)fails to use the default value.Currently,
__delitem__andget()already catchAttributeErrorand convert toKeyError—this inconsistency exposes the protocol violation.Proposed minimal fix
diff --git a/modelopt/torch/opt/config.py b/modelopt/torch/opt/config.py @@ -99,7 +99,7 @@ class ModeloptBaseConfig(BaseModel, MutableMapping[str, Any]): if field_info.alias == key: return name - raise AttributeError(f"Key {key} not found in the config.") + raise KeyError(key) def __contains__(self, key: str) -> bool: @@ -107,7 +107,7 @@ class ModeloptBaseConfig(BaseModel, MutableMapping[str, Any]): self.get_field_name_from_key(key) return True - except AttributeError: + except KeyError: return False def __delitem__(self, key: str) -> None: @@ -121,7 +121,7 @@ class ModeloptBaseConfig(BaseModel, MutableMapping[str, Any]): try: field_name = self.get_field_name_from_key(key) - except AttributeError as e: + except KeyError as e: raise KeyError(key) from e def get(self, key: str, default: Any = None) -> Any: @@ -129,7 +129,7 @@ class ModeloptBaseConfig(BaseModel, MutableMapping[str, Any]): try: return self[key] - except AttributeError: + except KeyError: return default🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/torch/opt/config.py` around lines 60 - 125, The mapping methods violate MutableMapping because get_field_name_from_key() can raise AttributeError; wrap calls to get_field_name_from_key() in __getitem__ and __setitem__ (and anywhere else that should follow mapping semantics) to catch AttributeError and re-raise KeyError(key) so missing-key lookups conform to the MutableMapping contract; locate the fixes in the ModeloptBaseConfig methods __getitem__, __setitem__ (and mirror the pattern used in __delitem__) to ensure consistency.
🧹 Nitpick comments (1)
examples/llm_ptq/example_utils.py (1)
849-855: ⚡ Quick winAlign
quant_cfgtype hints with the new Mapping-compatible behavior.
needs_checkpoint_path_updatenow supports anyMapping, but the signature still advertisesdict. Consider updating this signature (andresolve_checkpoint_dir) toMapping[str, Any]/MutableMapping[str, Any]so static typing matches runtime behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/llm_ptq/example_utils.py` around lines 849 - 855, The type hints should reflect that quant_cfg can be any Mapping; update the signatures of needs_checkpoint_path_update and resolve_checkpoint_dir to accept Mapping[str, Any] (or MutableMapping[str, Any] if the function mutates the dict) instead of dict, and import the required typing symbols (Mapping, MutableMapping, Any) at the top of the module; keep the runtime logic unchanged but ensure the annotations match the Mapping-compatible behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@modelopt/torch/opt/config.py`:
- Around line 60-125: The mapping methods violate MutableMapping because
get_field_name_from_key() can raise AttributeError; wrap calls to
get_field_name_from_key() in __getitem__ and __setitem__ (and anywhere else that
should follow mapping semantics) to catch AttributeError and re-raise
KeyError(key) so missing-key lookups conform to the MutableMapping contract;
locate the fixes in the ModeloptBaseConfig methods __getitem__, __setitem__ (and
mirror the pattern used in __delitem__) to ensure consistency.
---
Nitpick comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 849-855: The type hints should reflect that quant_cfg can be any
Mapping; update the signatures of needs_checkpoint_path_update and
resolve_checkpoint_dir to accept Mapping[str, Any] (or MutableMapping[str, Any]
if the function mutates the dict) instead of dict, and import the required
typing symbols (Mapping, MutableMapping, Any) at the top of the module; keep the
runtime logic unchanged but ensure the annotations match the Mapping-compatible
behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: fe529dcf-649f-47fa-910a-b1be49110dc6
📒 Files selected for processing (9)
examples/llm_ptq/example_utils.pyexamples/vllm_serve/vllm_ptq_utils.pymodelopt/torch/opt/config.pymodelopt/torch/quantization/config.pymodelopt/torch/quantization/conversion.pymodelopt/torch/quantization/nn/modules/tensor_quantizer.pymodelopt/torch/quantization/utils/core_utils.pytests/unit/torch/quantization/test_config_validation.pytests/unit/torch/quantization/test_quantize_cpu.py
🚧 Files skipped from review as they are similar to previous changes (2)
- modelopt/torch/quantization/conversion.py
- modelopt/torch/quantization/config.py
Have load_config return Pydantic-normalized values when schema_type or modelopt-schema is present, including typed recipe metadata and quantization config entries. Update recipe loading, docs, and unit tests for typed config objects and normalized quant_cfg handling. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Convert QuantizerCfgEntry into a ModeloptBaseConfig-backed Pydantic model with validation while preserving dict-style access for callers. Normalize schema-loaded quant_cfg snippets through model_dump, simplify quantizer cfg handling, and cover both dict and QuantizeConfig need_calibration inputs. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Update normalize_quant_cfg_list to accept dict entries, typed entries, and legacy dict formats while returning QuantizerCfgEntry objects. Preserve already parsed entries, handle implicit enable values in consumers, and cover mixed typed/dict inputs in tests. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Make ModeloptBaseConfig a MutableMapping and use Mapping/MutableMapping protocol checks for typed quantizer config entries and attributes. Convert predefined quantization recipes to QuantizeConfig objects while preserving dict-style callers and compatibility paths. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Cover normalization after mutating raw dict quantizer entries and schema-backed ModeloptBaseConfig entries. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
f9bc176 to
b0fadd1
Compare
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
b0fadd1 to
0917ab8
Compare
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 101-103: The current code uses entry.get("cfg") or {} which treats
any falsy cfg (like a typed empty mapping) as missing and replaces it; instead,
fetch cfg with entry.get("cfg") and only replace it when it is actually None or
not present (e.g., if entry.get("cfg") is None: set a new dict), then set
cfg["use_constant_amax"] = True and write back entry["cfg"] = cfg so you
preserve existing typed/empty config objects; reference the variables entry and
cfg and the key "use_constant_amax" when making the change.
In `@modelopt/torch/quantization/algorithms.py`:
- Around line 130-149: The _str_repr uses the variable name which can be None
when a QuantizeConfig instance is passed, causing distinct recipes to collapse;
update the constructor logic around QuantizeConfig handling so that if name is
None and quant_cfg is an instance of QuantizeConfig you derive a stable
identifier from the config (e.g., use a preserved recipe name/property on
QuantizeConfig or fallback to a deterministic descriptor like class name plus a
short hash of the config contents) before building self._str_repr and before
using the config for equality/hash; ensure you reference QuantizeConfig,
self.config, and self._str_repr when making the change.
In `@modelopt/torch/quantization/nn/modules/tensor_quantizer.py`:
- Around line 1428-1435: The method's type annotation and runtime check are
incorrect: replace the assert on line 1432 with an explicit exception (e.g.,
raise TypeError or ValueError) to validate inputs at runtime, and broaden the
type annotation on the attributes parameter to accept either a Mapping or a
list/tuple whose elements may be QuantizerAttributeConfig or Mapping (e.g.,
attributes: Mapping[str, Any] | list[QuantizerAttributeConfig | Mapping[str,
Any]] | tuple[QuantizerAttributeConfig | Mapping[str, Any], ...]); keep the
existing behavior of wrapping a single Mapping into a list with [attributes] *
len(self) and raise a clear error if a list/tuple length doesn't match self
(using ValueError).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d88a743f-edba-444a-9a09-53d15b1e99b4
📒 Files selected for processing (28)
docs/source/guides/_quant_cfg.rstexamples/diffusers/quantization/config.pyexamples/diffusers/quantization/quantize.pyexamples/llm_autodeploy/run_auto_quantize.pyexamples/llm_ptq/cast_mxfp4_to_nvfp4.pyexamples/llm_ptq/example_utils.pyexamples/llm_ptq/hf_ptq.pyexamples/llm_ptq/multinode_ptq.pyexamples/vllm_serve/vllm_ptq_utils.pymodelopt/onnx/llm_export_utils/quantization_utils.pymodelopt/recipe/config.pymodelopt/recipe/loader.pymodelopt/torch/opt/config.pymodelopt/torch/opt/config_loader.pymodelopt/torch/quantization/algorithms.pymodelopt/torch/quantization/backends/fp8_per_tensor_gemm.pymodelopt/torch/quantization/backends/nvfp4_gemm.pymodelopt/torch/quantization/config.pymodelopt/torch/quantization/conversion.pymodelopt/torch/quantization/mode.pymodelopt/torch/quantization/model_quant.pymodelopt/torch/quantization/nn/modules/tensor_quantizer.pymodelopt/torch/quantization/utils/core_utils.pytests/examples/diffusers/test_diffusers.pytests/unit/recipe/test_loader.pytests/unit/torch/opt/test_config.pytests/unit/torch/quantization/test_config_validation.pytests/unit/torch/quantization/test_quantize_cpu.py
✅ Files skipped from review due to trivial changes (2)
- examples/vllm_serve/vllm_ptq_utils.py
- examples/llm_ptq/example_utils.py
🚧 Files skipped from review as they are similar to previous changes (17)
- examples/llm_autodeploy/run_auto_quantize.py
- tests/unit/torch/quantization/test_quantize_cpu.py
- modelopt/torch/quantization/backends/nvfp4_gemm.py
- docs/source/guides/_quant_cfg.rst
- examples/diffusers/quantization/quantize.py
- modelopt/torch/quantization/mode.py
- modelopt/torch/quantization/utils/core_utils.py
- examples/llm_ptq/multinode_ptq.py
- examples/llm_ptq/cast_mxfp4_to_nvfp4.py
- modelopt/recipe/loader.py
- modelopt/torch/quantization/model_quant.py
- modelopt/torch/opt/config_loader.py
- modelopt/onnx/llm_export_utils/quantization_utils.py
- tests/unit/torch/quantization/test_config_validation.py
- tests/unit/recipe/test_loader.py
- modelopt/torch/quantization/conversion.py
- modelopt/torch/quantization/config.py
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
- Widen _validate_named_auto_quantize_formats to accept any non-string Sequence so it matches the Sequence annotation on its caller. - Extract explicit_keys_of/explicit_items_of helpers in modelopt/torch/opt/config.py and use them in fp8/nvfp4 GEMM availability checks and the diffusers example to remove duplicated legacy-dict vs typed-config branching. - Forward by_alias/exclude_none/mode kwargs to inner QuantizerCfgEntry dumps while keeping exclude_unset=True hardcoded so model_dump round trips through model_validate. - Clarify that QuantizerCfgEntry's null-validator applies to raw Mapping inputs only. - Switch the diffusers test module loader to a uuid-suffixed module name so parallel pytest workers can't collide on sys.modules. - Add regression tests covering: TypeError on pop of an existing typed-config key, legacy-dict vs typed-config explicit_items_of parity for availability checks, and round-trippable sparse model_dump for QuantizeConfig. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
- Drop the dead Mapping exclusion in _validate_named_auto_quantize_formats (collections.abc.Mapping does not subclass Sequence, so the prior isinstance(quantization_formats, Sequence) check already rejects them). - Document in _set_kv_cache_constant_amax that the matched entry is downgraded to a plain dict, so a list[QuantizerCfgEntry] input becomes heterogeneous and callers must round-trip through normalize_quant_cfg_list or QuantizeConfig.model_validate before reusing it as typed input. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
- update_quant_cfg_with_kv_cache_quant: check whether the caller supplied an inner quant_cfg BEFORE running QuantizeConfig.model_validate. Without this, an empty dict (e.g. vLLM with no base recipe) materializes the schema default and silently quantizes the whole model at INT8 alongside the KV-cache config. - KV-only constants (FP8_KV_CFG, FP8_AFFINE_KV_CFG, NVFP4_KV_CFG, NVFP4_AFFINE_KV_CFG, NVFP4_KV_ROTATE_CFG) are now sparse dicts produced by model_dump(exclude_unset=True), so dict-style merges never leak the default algorithm="max" into a primary recipe. NVFP4_KV_ROTATE_CFG keeps its explicit algorithm="max" because it was set in the spec. - ModeloptBaseConfig.__setitem__ now honors the model's ``extra`` policy: ``extra='allow'`` configs (ModeloptBaseRuleConfig and rule-based subclasses) can update existing extra keys and insert new ones through the mapping API; the default ``extra='forbid'`` policy still rejects unknown keys. - Add regression tests for each: KV-only disable-all fallback, sparse KV constants, and __setitem__ extras parity across extra policies. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Replaces the previous fix that demoted KV-cache constants to plain dicts so that dict(KV_CFG) / **KV_CFG would not leak the schema default algorithm. The new approach keeps the KV-cache constants as full QuantizeConfig instances (preserving typing and validation) and introduces an explicit merge_quantize_config function that fixes the actual underlying problem: naive dict-style merges of partial overlays. merge_quantize_config semantics: - OmegaConf-style recursive merge, left-to-right. - Typed ModeloptBaseConfig overlays only contribute fields in model_fields_set, so schema defaults on the overlay do NOT overwrite explicitly-set values on the base. Plain Mapping overlays contribute every key. - The quant_cfg field always EXTENDS (overlay rules appended); other list-valued fields REPLACE per OmegaConf default. This matches the rule-list / last-match-wins semantics of set_quantizer_by_cfg. - None overlays are skipped. The final result is validated through QuantizeConfig.model_validate. Refactor update_quant_cfg_with_kv_cache_quant to use the new merge. KV-cache constants revert to QuantizeConfig instances; the "sparse-dicts" regression test is replaced by a TestMergeQuantizeConfig suite that locks the merge semantics directly (base algorithm preserved when overlay's algorithm is default, overlay wins when explicit, plain dict overlays contribute every key, None overlays drop out, quant_cfg extends). Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Follow-ups from pass-3 review notes: - _to_explicit_dict: document that the plain-dict branch returns a shallow copy and aliasing is safe because the final model_validate re-parses; callers must take their own deep copy if they mutate overlays after. - merge_quantize_config tests import from the public mtq namespace so any future migration of the implementation path is caught. Add a small test that asserts the function is part of the public surface. - update_quant_cfg_with_kv_cache_quant: replace the algorithm-fallback ``if not ...`` with ``is None`` so a falsy-but-set value (0, "", []) is preserved. - ModeloptBaseConfig.get_field_name_from_key: document the "raises AttributeError on unknown keys" invariant that __setitem__ relies on to dispatch into the extras-allowed insertion path. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
- update_quant_cfg_with_kv_cache_quant: tighten the algorithm fallback so an explicit ``algorithm=None`` (a deliberate "no calibration" signal honored by ``need_calibration``) is preserved. The fallback now checks ``model_fields_set`` AND value-is-None, so it only fires when neither side explicitly set algorithm and the schema default itself happens to be None. Test gaps from the final review: - Multi-overlay merge_quantize_config(base, ov1, ov2): assert left-to-right accumulation explicitly (later scalar wins, quant_cfg extends through every overlay in turn). - set_quantizer_by_cfg with a hand-built typed QuantizeConfig (typed entries, no normalize call) — locks the typed-path contract. - update_quant_cfg_with_kv_cache_quant preserves user-set algorithm=None; separate test confirms the fallback fires to "max" when neither side sets algorithm. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
What does this PR do?
Type of change: new feature
This PR schematizes ModelOpt config loading and quantization config entries while keeping the existing YAML/dict config surface backward compatible.
ModeloptBaseConfigimplementMutableMapping, including__delitem__unset semantics formodel_dump(exclude_unset=True). This change will make us be able to use ModeloptBaseConfig and simple dict interchangeably. But we will formally use ModeloptBaseConfig, dict is just to make it backward compatible.QuantizerCfgEntryTypedDictwith aModeloptBaseConfig.load_configreturn Pydantic-normalized values when aschema_typeormodelopt-schemaannotation is present, with overloads documenting the typed return behavior.ModeloptBaseConfigPydantic model so recipe configs share the same schema/field behavior as other ModelOpt configs.QuantizeConfigobjects viaQuantizeConfig.model_validate(...).Mapping/MutableMappingprotocol checks for typed config entries and attributes while preserving dict-style access used by existing code.need_calibrationargument types.Behavior changes worth calling out
Beyond pure schematization, this PR changes a few behaviors that external integrators should be aware of:
QuantRecipe(custom_cfg)now requires aname. PreviouslyQuantRecipe(custom_cfg)would auto-discover a name viaget_auto_name_for_config; that fallback was removed and the constructor now raisesValueError("name must be provided when quant_cfg is not None"). External tooling (custom searchers, downstream auto-quant pipelines) that constructedQuantRecipedirectly must passname=...._get_search_recipesrequires pre-formed(quant_cfg, name)tuples. Internal-only protocol change paired with the above; pair the two in any integrator note._fp8_availability_check,_nvfp4_availability_check) short-circuit onmodule.input_quantizer.is_enabled/module.weight_quantizer.is_enabledand skip theenablekey when iterating the preset cfg. Combined with the typed-config iteration helper (explicit_items_of), legacy-dict and typedQuantizerAttributeConfigcfgs now go through the same iteration path and only the explicitly-set fields are compared.quantizer_nameset now default toenable=Truevia the schema (previously raised "must specify 'cfg' or 'enable'"). This is a deliberate relaxation; misspellings like{quantizer_name: ..., enabel: False}will no longer be caught by a missing-field error.del/popon existing schema keys of a typed config raisesTypeError.ModeloptBaseConfigexposes a fixed key set; callers that previously didentry.pop("enable")to drop a key must now read-then-overwrite.pop("missing", default)still works because the lookup short-circuits before the deletion path.mtq.update_quant_cfg_with_kv_cache_quantnow returns aQuantizeConfiginstead of a plaindict. Callers that did mapping-style access (result["quant_cfg"],result["algorithm"],for k, v in result.items()) still work becauseQuantizeConfigis aMutableMapping. Callers that didisinstance(result, dict)will now seeFalse; switch toisinstance(result, (dict, ModeloptBaseConfig))orisinstance(result, Mapping).FP8_DEFAULT_CFG,NVFP4_DEFAULT_CFG,FP8_KV_CFG, etc.) are nowQuantizeConfiginstances, not plain dicts. Dict-style access (cfg["quant_cfg"],cfg["algorithm"]) and iteration still work via theMutableMappingmixin, butdict(cfg)/**cfg/cfg.get("algorithm")will surface every schema field — including default-filled ones likealgorithm="max"on partial KV overlays — which can silently override a primary recipe's algorithm when used in ad-hoc dict merges. Usemtq.merge_quantize_config(base, *overlays)for composition; it respects each side'smodel_fields_setso defaults from a partial overlay don't leak onto the base.mtq.merge_quantize_config(base, *overlays)public helper for composing a primary recipe with partial overlays (e.g. KV-cache configs). Right-most wins for explicitly-set fields,quant_cfgrule lists EXTEND, other lists REPLACE, schema defaults on the right do NOT overwrite explicitly-set values on the left. Used internally byupdate_quant_cfg_with_kv_cache_quant.Migration notes
QuantRecipefrom a custom config:QuantRecipe(custom_cfg, name="MY_CFG")instead ofQuantRecipe(custom_cfg).entry.explicit_items()/ the newmodelopt.torch.opt.config.explicit_items_of/explicit_keys_ofhelpers overdict(entry).items(). Typed entries now expose every schema field (4 keys) through__iter__, whileexplicit_*keeps the original sparse user-input shape.entry["cfg"] = Nonerather thanentry.pop("cfg")).QuantizeConfig:QuantizeConfig.model_dump()keeps innerquant_cfgentries in sparse form (exclude_unset=Trueis hardcoded for inner entries) soQuantizeConfig.model_validate(cfg.model_dump())still works. Formatting kwargs (by_alias,exclude_none,mode) are forwarded to inner entries.mtq.merge_quantize_config(primary, overlay)rather than{**primary, **overlay}ordict(primary).update(overlay). The new helper respects each side'smodel_fields_setso schema defaults from a partial overlay never overwrite the primary's explicitly-set values.Usage
Existing YAML and dict-based quantization configs continue to work. Callers that want schema-backed objects can now request them directly:
Predefined Python configs are now schema-backed but still support mapping-style access:
Testing
python -m pytest tests/unit/recipe/test_loader.pypython -m pytest tests/unit/torch/quantization/test_config_validation.pypython -m pytest tests/unit/torch/quantization/test_autoquant.pypython -m pytest tests/unit/torch/utils/test_serialization.pyruff checkon the touched Python filesruff format --checkon the touched Python filespre-commit run mypy --files ...on the touched Python filesgit diff --checkBefore your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/AAdditional Information
No related issue. This PR intentionally does not change checked-in YAML configs; it updates the loader/schema layer and Python-side quantization config handling.