fix(te-plugin): make _Linear arg indexing robust to TE signature changes by kevalmorabia97 · Pull Request #1473 · NVIDIA/Model-Optimizer

kevalmorabia97 · 2026-05-12T18:29:00Z

What does this PR do?

Type of change: Bug fix

ModelOpt's te_quantized_linear_fn and te_grouped_quantized_linear_fn read weight / inp from hard-coded positions in args. Two TE signature changes broke this scheme:

TE 1.x → 2.0: dropped the legacy weight_fp8 slot between weight and inp. ModelOpt handled this with an if Version("2.0") <= _TE_VERSION: branch + a duplicate else branch.
TE 2.14 → 2.15: inserted weight_workspace between weight and inp at the _Linear.forward call site (TE 2.15 linear.py L1663). Unhandled by ModelOpt — args[idx + 1] resolved to None (workspace is None outside FP8), which then crashed TensorQuantizer.forward on inputs.numel() with AttributeError: 'NoneType' object has no attribute 'numel'. Surfaced as a regression in Megatron-Bridge after the TE 2.15 bump alongside ModelOpt 0.44.0rc3.
TE 2.10: _GroupedLinear.forward's second positional slot was renamed m_splits → non_tensor_args (tuple wrapping). ModelOpt had a separate Version("2.10") gate for this.

Replace all three version gates with parameter-name introspection of the live _Linear.forward / _GroupedLinear.forward signature. The parameter names (weight, inp, m_splits, non_tensor_args) have been stable across TE 1.x, 2.x, and 2.15+; only their relative positions shift. The new code reads the live signature via inspect.signature(...).parameters, locates weight/inp by name, and mutates only those positions in a list copy of args — everything between (e.g. TE 2.15's weight_workspace) and after passes through verbatim. The dual-branch code in te_quantized_linear_fn collapses to a single path.

Usage

No public API change. PTQ continues to work transparently across all supported TE versions:

import modelopt.torch.quantization as mtq
# Works on TE 1.x, 2.0-2.14, 2.15.x, and 2.16+ — no version flag needed.
mtq.quantize(model, mtq.NVFP4_DEFAULT_CFG, forward_loop)

Testing

Existing TE plugin tests (tests/gpu_megatron/torch/quantization/plugins/test_transformer_engine.py) exercise both the _forward (no-grad calibration) and _apply (grad-enabled training) paths of te_quantized_linear_fn for te.pytorch.Linear — they would have caught the original TE 2.15 regression on a CI matrix entry pinned to TE 2.15. Verified trace correctness across:

TE version	`_Linear.forward` signature	`_te_linear` weight→inp gap	`_GroupedLinear.forward` second slot
1.x	`(ctx, weight, weight_fp8, inp, …)`	1	n/a
2.0–2.14	`(ctx, weight, inp, bias, …)`	0	`m_splits`
2.15.x	`(ctx, weight, weight_workspace, inp, …)`	1	`non_tensor_args`
2.16+ (main)	`(ctx, weight, inp, bias, fwd_args)`	0	`non_tensor_args`

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: Existing tests sufficient

Additional Information

Triggered by Megatron-Bridge NVIDIA-NeMo/Megatron-Bridge#3783 failing tests after bumping ModelOpt 0.44.0rc2 → 0.44.0rc3 together with a Megatron-LM bump that pulls TE 2.15. ModelOpt rc2 had the same latent bug — it just wasn't exercised until TE 2.15 became the runtime version.

Summary by CodeRabbit

Refactor
- Improved Transformer Engine quantization plugin robustness by using runtime parameter inspection instead of version-based branching, ensuring compatibility across TE versions without requiring manual updates.

ModelOpt's te_quantized_linear_fn and te_grouped_quantized_linear_fn read weight/inp from hard-coded positions in args. TE 2.15 broke this by inserting weight_workspace between weight and inp at the _Linear.forward call site — passing None into self.input_quantizer() and crashing with "AttributeError: 'NoneType' object has no attribute 'numel'" during calibration (observed in Megatron-Bridge after bumping to modelopt 0.44.0rc3). Replace the version-gated indexing with parameter-name introspection of the live _Linear.forward / _GroupedLinear.forward signature. The param names weight / inp / m_splits / non_tensor_args have been stable across TE 1.x, 2.x, and 2.15+; the gap between them changes. This also subsumes the old TE < 2.0 branch (`weight, weight_fp8, inputs`) — the same gap-detection handles it uniformly, so the dual-branch code is replaced with one path. Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

coderabbitai · 2026-05-12T18:29:13Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9ec4b8ff-84a0-4248-a796-e17c0508fa0e

📥 Commits

Reviewing files that changed from the base of the PR and between 9d16ad9 and eeb9b9d.

📒 Files selected for processing (1)

modelopt/torch/quantization/plugins/transformer_engine.py

🚧 Files skipped from review as they are similar to previous changes (1)

modelopt/torch/quantization/plugins/transformer_engine.py

📝 Walkthrough

Walkthrough

Replaces TE-version branching with runtime signature inspection (via inspect) in Transformer Engine quantization wrappers to locate and quantize input and weight argument positions before invoking TE functions.

Changes

Transformer Engine Quantization Signature Introspection

Layer / File(s)	Summary
Linear quantization signature inspection `modelopt/torch/quantization/plugins/transformer_engine.py`	Add `inspect` import and refactor `_QuantTELinear.te_quantized_linear_fn` to use `inspect.signature()` on `te_linear._Linear.forward` to find `weight` and `inp` positions (accounting for context offset), apply `weight_quantizer` and `input_quantizer` to those args, and call the TE function with the modified arguments.
Grouped linear quantization signature inspection `modelopt/torch/quantization/plugins/transformer_engine.py`	Refactor `_QuantTEGroupedLinear.te_grouped_quantized_linear_fn` to inspect `te_grouped_linear._GroupedLinear.forward` to locate `inp`, detect grouped layout via `non_tensor_args` or `m_splits`, compute `num_gemms`, quantize `inp` and each grouped weight argument by index, and invoke the TE grouped function.

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: making _Linear argument indexing robust to Transformer Engine signature changes using runtime introspection instead of hard-coded indexing.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security anti-patterns detected. PR adds only inspect (stdlib) to read function signatures safely. No unsafe deserialization, eval/exec, hardcoded secrets, or problematic dependencies introduced.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kmorabia/te-plugin-introspect-signatures

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kevalmorabia97 · 2026-05-12T18:30:49Z

/claude review

github-actions · 2026-05-12T18:32:24Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-12 20:11 UTC

claude · 2026-05-12T18:39:39Z

Claude Review Summary

Findings: CRITICAL: 2, IMPORTANT: 0, SUGGESTION: 0

Most impactful finding

The introspection helper reads inspect.signature(te_linear._Linear.forward) (and the _GroupedLinear equivalent), but during the no-grad calibration path those attributes have already been replaced by the functools.partial wrapper created in _ParallelLinear.functionals_to_replace. Per inspect.signature semantics for partials, the resulting parameter list collapses to ['args', 'kwargs'], and the names.index("weight") / sig_params.index("inp") lookups raise ValueError on the first calibration forward.

The grad-enabled path coincidentally works because there the patch swaps apply, leaving forward untouched.

Suggested fix: use the saved original that replace_function writes as "_" + name:

orig_forward = getattr(te_linear._Linear, "_forward", te_linear._Linear.forward)
names = list(inspect.signature(orig_forward).parameters)

(and similarly for _GroupedLinear).

Risk

Elevated. The fix as written is brittle in exactly the path the PR description advertises as covered (_forward / no-grad calibration). Please run tests/gpu_megatron/torch/quantization/plugins/test_transformer_engine.py::test_quantize against a real TE install — any TE version, not just 2.15 — before merging. If that test already passes as-is, I'd like to understand how, because the static analysis above predicts a hard failure on calibration entry. It's possible the test matrix currently skips TE-based tests (no TE available), in which case the regression this PR is meant to fix and the regression it introduces would both go unnoticed until enabled.

Otherwise the introspection approach is a sound replacement for brittle version gates — this is purely about pointing the introspection at the un-patched signature.

codecov · 2026-05-12T18:42:50Z

Codecov Report

❌ Patch coverage is 43.47826% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.83%. Comparing base (2ce745a) to head (eeb9b9d).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
...t/torch/quantization/plugins/transformer_engine.py	43.47%	13 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1473      +/-   ##
==========================================
+ Coverage   76.64%   76.83%   +0.18%     
==========================================
  Files         478      473       -5     
  Lines       51643    51322     -321     
==========================================
- Hits        39582    39432     -150     
+ Misses      12061    11890     -171

Flag	Coverage Δ
examples	`41.61% <4.34%> (+0.91%)`	⬆️
gpu	`59.73% <43.47%> (-0.47%)`	⬇️
regression	`14.97% <4.34%> (-0.10%)`	⬇️
unit	`52.49% <4.34%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Inside `te_quantized_linear_fn` / `te_grouped_quantized_linear_fn` we run *while* `replace_function`'s context is active, so `_Linear.forward` and `_GroupedLinear.forward` point at the `functools.partial` wrapper that replaced them. `inspect.signature` of a partial collapses to `(*args, **kwargs)`, so `names.index("weight")` would raise `ValueError: 'weight' is not in list` on the first calibration forward. `replace_function` saves the original at `_Linear._forward` / `_GroupedLinear._forward` while the patch is active. Look that up (falling back to the still-untouched `.forward` when only `apply` was patched). Caught by Claude review on PR #1473. Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 · 2026-05-12T18:47:10Z

Thanks for the catch — the analysis is correct. inspect.signature(te_linear._Linear.forward) runs while replace_function's context is active, so it inspects the functools.partial wrapper whose signature collapses to (*args, **kwargs). names.index("weight") would raise ValueError on the first calibration forward.

Verified locally with a minimal repro:

import inspect
from functools import partial

class _Linear:
    @staticmethod
    def forward(ctx, weight, weight_workspace, inp, bias, non_tensor_args): pass

_Linear._forward = _Linear.forward
_Linear.forward = partial(lambda *a, **kw: None, _Linear, "_forward", "self")

# Buggy: list(inspect.signature(_Linear.forward).parameters) == ['args', 'kwargs']
# Fixed: list(inspect.signature(getattr(_Linear, "_forward", _Linear.forward)).parameters)
#       == ['ctx', 'weight', 'weight_workspace', 'inp', 'bias', 'non_tensor_args']

Fix in eeb9b9d — uses getattr(te_linear._Linear, "_forward", te_linear._Linear.forward) (and the _GroupedLinear analogue): when the forward patch is active the cached original is at _forward; when apply is patched instead, _forward is absent and forward is itself the original, so the fallback works for both paths.

Re: the test concern — agreed that test_quantize would have caught this had it run with a real TE install; the most likely explanation is the CI environment doesn't have TE available, so the pytest.importorskip("transformer_engine") skips the whole file. A TE-pinned matrix entry is the right follow-up but is out of scope for this PR.

kevalmorabia97 · 2026-05-12T18:49:18Z

/claude review

claude

Claude review passed — no significant issues found.

Traced the introspection logic end-to-end:

replace_function caches the original at _<name>, so getattr(_Linear, "_forward", _Linear.forward) correctly resolves to the un-patched forward for both the forward-patched and apply-patched paths (in the latter, _forward is absent and forward itself is still original).
ctx_offset arithmetic correctly maps signature-relative parameter positions to the args tuple (ctx is present in args only on the _forward path).
list(inspect.signature(...).parameters) includes VAR_POSITIONAL as a regular entry, so index-by-name works for _GroupedLinear's trailing *weights_and_biases too.
In-place mutation of specific indices in new_args correctly preserves intermediate positional args (the TE 2.15 weight_workspace fix).

No mode/state/export/compat concerns — internal helper, no checkpoint schema or public API change. Future TE param renames would surface as a loud ValueError from .index(), which is the right failure mode.

LGTM.

shengliangxu

LGTM

kinjalpatel27

LGTM

…ges (#1473) ### What does this PR do? Type of change: Bug fix ModelOpt's `te_quantized_linear_fn` and `te_grouped_quantized_linear_fn` read `weight` / `inp` from hard-coded positions in `args`. Two TE signature changes broke this scheme: - **TE 1.x → 2.0:** dropped the legacy `weight_fp8` slot between `weight` and `inp`. ModelOpt handled this with an `if Version("2.0") <= _TE_VERSION:` branch + a duplicate else branch. - **TE 2.14 → 2.15:** inserted `weight_workspace` between `weight` and `inp` at the `_Linear.forward` call site ([TE 2.15 linear.py L1663](https://github.com/NVIDIA/TransformerEngine/blob/release_v2.15/transformer_engine/pytorch/module/linear.py#L1663)). Unhandled by ModelOpt — `args[idx + 1]` resolved to `None` (workspace is None outside FP8), which then crashed `TensorQuantizer.forward` on `inputs.numel()` with `AttributeError: 'NoneType' object has no attribute 'numel'`. Surfaced as a regression in Megatron-Bridge after the TE 2.15 bump alongside ModelOpt 0.44.0rc3. - **TE 2.10:** `_GroupedLinear.forward`'s second positional slot was renamed `m_splits` → `non_tensor_args` (tuple wrapping). ModelOpt had a separate `Version("2.10")` gate for this. Replace all three version gates with **parameter-name introspection** of the live `_Linear.forward` / `_GroupedLinear.forward` signature. The parameter names (`weight`, `inp`, `m_splits`, `non_tensor_args`) have been stable across TE 1.x, 2.x, and 2.15+; only their relative positions shift. The new code reads the live signature via `inspect.signature(...).parameters`, locates `weight`/`inp` by name, and mutates only those positions in a list copy of `args` — everything between (e.g. TE 2.15's `weight_workspace`) and after passes through verbatim. The dual-branch code in `te_quantized_linear_fn` collapses to a single path. ### Usage No public API change. PTQ continues to work transparently across all supported TE versions: ```python import modelopt.torch.quantization as mtq # Works on TE 1.x, 2.0-2.14, 2.15.x, and 2.16+ — no version flag needed. mtq.quantize(model, mtq.NVFP4_DEFAULT_CFG, forward_loop) ``` ### Testing  Existing TE plugin tests (`tests/gpu_megatron/torch/quantization/plugins/test_transformer_engine.py`) exercise both the `_forward` (no-grad calibration) and `_apply` (grad-enabled training) paths of `te_quantized_linear_fn` for `te.pytorch.Linear` — they would have caught the original TE 2.15 regression on a CI matrix entry pinned to TE 2.15. Verified trace correctness across: | TE version | `_Linear.forward` signature | `_te_linear` weight→inp gap | `_GroupedLinear.forward` second slot | |---|---|---|---| | 1.x | `(ctx, weight, weight_fp8, inp, …)` | 1 | n/a | | 2.0–2.14 | `(ctx, weight, inp, bias, …)` | 0 | `m_splits` | | 2.15.x | `(ctx, weight, weight_workspace, inp, …)` | 1 | `non_tensor_args` | | 2.16+ (main) | `(ctx, weight, inp, bias, fwd_args)` | 0 | `non_tensor_args` | ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: ✅  - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A  - Did you write any new necessary tests?: Existing tests sufficient  ### Additional Information  Triggered by Megatron-Bridge NVIDIA-NeMo/Megatron-Bridge#3783 failing tests after bumping ModelOpt 0.44.0rc2 → 0.44.0rc3 together with a Megatron-LM bump that pulls TE 2.15. ModelOpt rc2 had the same latent bug — it just wasn't exercised until TE 2.15 became the runtime version.  ## Summary by CodeRabbit * **Refactor** * Improved Transformer Engine quantization plugin robustness by using runtime parameter inspection instead of version-based branching, ensuring compatibility across TE versions without requiring manual updates. [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/Model-Optimizer/pull/1473)  --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 requested a review from a team as a code owner May 12, 2026 18:29

kevalmorabia97 requested a review from cjluo-nv May 12, 2026 18:29

kevalmorabia97 requested review from ChenhanYu, kinjalpatel27, realAsma and yueshen2016 and removed request for cjluo-nv May 12, 2026 18:29

claude Bot reviewed May 12, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/plugins/transformer_engine.py Outdated

claude Bot reviewed May 12, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/plugins/transformer_engine.py

claude Bot approved these changes May 12, 2026

View reviewed changes

kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label May 12, 2026

shengliangxu approved these changes May 12, 2026

View reviewed changes

kinjalpatel27 approved these changes May 12, 2026

View reviewed changes

kevalmorabia97 enabled auto-merge (squash) May 12, 2026 19:20

auto-merge was automatically disabled May 12, 2026 20:10
Branch protection rule check failed

kevalmorabia97 merged commit 5da4d63 into main May 12, 2026
63 of 65 checks passed

kevalmorabia97 deleted the kmorabia/te-plugin-introspect-signatures branch May 12, 2026 20:10

kevalmorabia97 added the cherry-pick-done Added by bot once PR is cherry-picked to the release branch label May 12, 2026

Conversation

kevalmorabia97 commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

kevalmorabia97 commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented May 12, 2026

Claude Review Summary

Most impactful finding

Risk

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kevalmorabia97 commented May 12, 2026

Uh oh!

kevalmorabia97 commented May 12, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

shengliangxu left a comment

Choose a reason for hiding this comment

Uh oh!

kinjalpatel27 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kevalmorabia97 commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading

github-actions Bot commented May 12, 2026 •

edited

Loading

codecov Bot commented May 12, 2026 •

edited

Loading