fix(te-plugin): make _Linear arg indexing robust to TE signature changes#1473
Conversation
ModelOpt's te_quantized_linear_fn and te_grouped_quantized_linear_fn read weight/inp from hard-coded positions in args. TE 2.15 broke this by inserting weight_workspace between weight and inp at the _Linear.forward call site — passing None into self.input_quantizer() and crashing with "AttributeError: 'NoneType' object has no attribute 'numel'" during calibration (observed in Megatron-Bridge after bumping to modelopt 0.44.0rc3). Replace the version-gated indexing with parameter-name introspection of the live _Linear.forward / _GroupedLinear.forward signature. The param names weight / inp / m_splits / non_tensor_args have been stable across TE 1.x, 2.x, and 2.15+; the gap between them changes. This also subsumes the old TE < 2.0 branch (`weight, weight_fp8, inputs`) — the same gap-detection handles it uniformly, so the dual-branch code is replaced with one path. Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughReplaces TE-version branching with runtime signature inspection (via inspect) in Transformer Engine quantization wrappers to locate and quantize input and weight argument positions before invoking TE functions. ChangesTransformer Engine Quantization Signature Introspection
🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
/claude review |
|
Claude Review SummaryFindings: CRITICAL: 2, IMPORTANT: 0, SUGGESTION: 0 Most impactful findingThe introspection helper reads The grad-enabled path coincidentally works because there the patch swaps Suggested fix: use the saved original that orig_forward = getattr(te_linear._Linear, "_forward", te_linear._Linear.forward)
names = list(inspect.signature(orig_forward).parameters)(and similarly for RiskElevated. The fix as written is brittle in exactly the path the PR description advertises as covered ( Otherwise the introspection approach is a sound replacement for brittle version gates — this is purely about pointing the introspection at the un-patched signature. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1473 +/- ##
==========================================
+ Coverage 76.64% 76.83% +0.18%
==========================================
Files 478 473 -5
Lines 51643 51322 -321
==========================================
- Hits 39582 39432 -150
+ Misses 12061 11890 -171
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Inside `te_quantized_linear_fn` / `te_grouped_quantized_linear_fn` we run
*while* `replace_function`'s context is active, so `_Linear.forward` and
`_GroupedLinear.forward` point at the `functools.partial` wrapper that
replaced them. `inspect.signature` of a partial collapses to
`(*args, **kwargs)`, so `names.index("weight")` would raise
`ValueError: 'weight' is not in list` on the first calibration forward.
`replace_function` saves the original at `_Linear._forward` /
`_GroupedLinear._forward` while the patch is active. Look that up
(falling back to the still-untouched `.forward` when only `apply` was
patched). Caught by Claude review on PR #1473.
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
Thanks for the catch — the analysis is correct. Verified locally with a minimal repro: import inspect
from functools import partial
class _Linear:
@staticmethod
def forward(ctx, weight, weight_workspace, inp, bias, non_tensor_args): pass
_Linear._forward = _Linear.forward
_Linear.forward = partial(lambda *a, **kw: None, _Linear, "_forward", "self")
# Buggy: list(inspect.signature(_Linear.forward).parameters) == ['args', 'kwargs']
# Fixed: list(inspect.signature(getattr(_Linear, "_forward", _Linear.forward)).parameters)
# == ['ctx', 'weight', 'weight_workspace', 'inp', 'bias', 'non_tensor_args']Fix in eeb9b9d — uses Re: the test concern — agreed that |
|
/claude review |
There was a problem hiding this comment.
Claude review passed — no significant issues found.
Traced the introspection logic end-to-end:
replace_functioncaches the original at_<name>, sogetattr(_Linear, "_forward", _Linear.forward)correctly resolves to the un-patched forward for both theforward-patched andapply-patched paths (in the latter,_forwardis absent andforwarditself is still original).ctx_offsetarithmetic correctly maps signature-relative parameter positions to theargstuple (ctx is present in args only on the_forwardpath).list(inspect.signature(...).parameters)includes VAR_POSITIONAL as a regular entry, so index-by-name works for_GroupedLinear's trailing*weights_and_biasestoo.- In-place mutation of specific indices in
new_argscorrectly preserves intermediate positional args (the TE 2.15weight_workspacefix).
No mode/state/export/compat concerns — internal helper, no checkpoint schema or public API change. Future TE param renames would surface as a loud ValueError from .index(), which is the right failure mode.
LGTM.
Branch protection rule check failed
…ges (#1473) ### What does this PR do? Type of change: Bug fix ModelOpt's `te_quantized_linear_fn` and `te_grouped_quantized_linear_fn` read `weight` / `inp` from hard-coded positions in `args`. Two TE signature changes broke this scheme: - **TE 1.x → 2.0:** dropped the legacy `weight_fp8` slot between `weight` and `inp`. ModelOpt handled this with an `if Version("2.0") <= _TE_VERSION:` branch + a duplicate else branch. - **TE 2.14 → 2.15:** inserted `weight_workspace` between `weight` and `inp` at the `_Linear.forward` call site ([TE 2.15 linear.py L1663](https://github.com/NVIDIA/TransformerEngine/blob/release_v2.15/transformer_engine/pytorch/module/linear.py#L1663)). Unhandled by ModelOpt — `args[idx + 1]` resolved to `None` (workspace is None outside FP8), which then crashed `TensorQuantizer.forward` on `inputs.numel()` with `AttributeError: 'NoneType' object has no attribute 'numel'`. Surfaced as a regression in Megatron-Bridge after the TE 2.15 bump alongside ModelOpt 0.44.0rc3. - **TE 2.10:** `_GroupedLinear.forward`'s second positional slot was renamed `m_splits` → `non_tensor_args` (tuple wrapping). ModelOpt had a separate `Version("2.10")` gate for this. Replace all three version gates with **parameter-name introspection** of the live `_Linear.forward` / `_GroupedLinear.forward` signature. The parameter names (`weight`, `inp`, `m_splits`, `non_tensor_args`) have been stable across TE 1.x, 2.x, and 2.15+; only their relative positions shift. The new code reads the live signature via `inspect.signature(...).parameters`, locates `weight`/`inp` by name, and mutates only those positions in a list copy of `args` — everything between (e.g. TE 2.15's `weight_workspace`) and after passes through verbatim. The dual-branch code in `te_quantized_linear_fn` collapses to a single path. ### Usage No public API change. PTQ continues to work transparently across all supported TE versions: ```python import modelopt.torch.quantization as mtq # Works on TE 1.x, 2.0-2.14, 2.15.x, and 2.16+ — no version flag needed. mtq.quantize(model, mtq.NVFP4_DEFAULT_CFG, forward_loop) ``` ### Testing <!-- Mention how have you tested your change if applicable. --> Existing TE plugin tests (`tests/gpu_megatron/torch/quantization/plugins/test_transformer_engine.py`) exercise both the `_forward` (no-grad calibration) and `_apply` (grad-enabled training) paths of `te_quantized_linear_fn` for `te.pytorch.Linear` — they would have caught the original TE 2.15 regression on a CI matrix entry pinned to TE 2.15. Verified trace correctness across: | TE version | `_Linear.forward` signature | `_te_linear` weight→inp gap | `_GroupedLinear.forward` second slot | |---|---|---|---| | 1.x | `(ctx, weight, weight_fp8, inp, …)` | 1 | n/a | | 2.0–2.14 | `(ctx, weight, inp, bias, …)` | 0 | `m_splits` | | 2.15.x | `(ctx, weight, weight_workspace, inp, …)` | 1 | `non_tensor_args` | | 2.16+ (main) | `(ctx, weight, inp, bias, fwd_args)` | 0 | `non_tensor_args` | ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: ✅ <!--- Public API unchanged; broadens the range of TE versions that work (TE 2.15.x now supported, TE 1.x still supported via the same introspection path). --> - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A <!--- Only adds a stdlib `inspect` import. --> - Did you write any new necessary tests?: Existing tests sufficient <!--- Bug fix is covered by existing `test_transformer_engine.py` for whatever single TE version CI exercises. A multi-version TE matrix is the right next step but is out of scope for this PR. --> ### Additional Information <!-- E.g. related issue. --> Triggered by Megatron-Bridge NVIDIA-NeMo/Megatron-Bridge#3783 failing tests after bumping ModelOpt 0.44.0rc2 → 0.44.0rc3 together with a Megatron-LM bump that pulls TE 2.15. ModelOpt rc2 had the same latent bug — it just wasn't exercised until TE 2.15 became the runtime version. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Improved Transformer Engine quantization plugin robustness by using runtime parameter inspection instead of version-based branching, ensuring compatibility across TE versions without requiring manual updates. [](https://app.coderabbit.ai/change-stack/NVIDIA/Model-Optimizer/pull/1473) <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
What does this PR do?
Type of change: Bug fix
ModelOpt's
te_quantized_linear_fnandte_grouped_quantized_linear_fnreadweight/inpfrom hard-coded positions inargs. Two TE signature changes broke this scheme:weight_fp8slot betweenweightandinp. ModelOpt handled this with anif Version("2.0") <= _TE_VERSION:branch + a duplicate else branch.weight_workspacebetweenweightandinpat the_Linear.forwardcall site (TE 2.15 linear.py L1663). Unhandled by ModelOpt —args[idx + 1]resolved toNone(workspace is None outside FP8), which then crashedTensorQuantizer.forwardoninputs.numel()withAttributeError: 'NoneType' object has no attribute 'numel'. Surfaced as a regression in Megatron-Bridge after the TE 2.15 bump alongside ModelOpt 0.44.0rc3._GroupedLinear.forward's second positional slot was renamedm_splits→non_tensor_args(tuple wrapping). ModelOpt had a separateVersion("2.10")gate for this.Replace all three version gates with parameter-name introspection of the live
_Linear.forward/_GroupedLinear.forwardsignature. The parameter names (weight,inp,m_splits,non_tensor_args) have been stable across TE 1.x, 2.x, and 2.15+; only their relative positions shift. The new code reads the live signature viainspect.signature(...).parameters, locatesweight/inpby name, and mutates only those positions in a list copy ofargs— everything between (e.g. TE 2.15'sweight_workspace) and after passes through verbatim. The dual-branch code inte_quantized_linear_fncollapses to a single path.Usage
No public API change. PTQ continues to work transparently across all supported TE versions:
Testing
Existing TE plugin tests (
tests/gpu_megatron/torch/quantization/plugins/test_transformer_engine.py) exercise both the_forward(no-grad calibration) and_apply(grad-enabled training) paths ofte_quantized_linear_fnforte.pytorch.Linear— they would have caught the original TE 2.15 regression on a CI matrix entry pinned to TE 2.15. Verified trace correctness across:_Linear.forwardsignature_te_linearweight→inp gap_GroupedLinear.forwardsecond slot(ctx, weight, weight_fp8, inp, …)(ctx, weight, inp, bias, …)m_splits(ctx, weight, weight_workspace, inp, …)non_tensor_args(ctx, weight, inp, bias, fwd_args)non_tensor_argsBefore your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/AAdditional Information
Triggered by Megatron-Bridge NVIDIA-NeMo/Megatron-Bridge#3783 failing tests after bumping ModelOpt 0.44.0rc2 → 0.44.0rc3 together with a Megatron-LM bump that pulls TE 2.15. ModelOpt rc2 had the same latent bug — it just wasn't exercised until TE 2.15 became the runtime version.
Summary by CodeRabbit