Skip to content

fix(te-plugin): make _Linear arg indexing robust to TE signature changes#1473

Merged
kevalmorabia97 merged 2 commits into
mainfrom
kmorabia/te-plugin-introspect-signatures
May 12, 2026
Merged

fix(te-plugin): make _Linear arg indexing robust to TE signature changes#1473
kevalmorabia97 merged 2 commits into
mainfrom
kmorabia/te-plugin-introspect-signatures

Conversation

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 commented May 12, 2026

What does this PR do?

Type of change: Bug fix

ModelOpt's te_quantized_linear_fn and te_grouped_quantized_linear_fn read weight / inp from hard-coded positions in args. Two TE signature changes broke this scheme:

  • TE 1.x → 2.0: dropped the legacy weight_fp8 slot between weight and inp. ModelOpt handled this with an if Version("2.0") <= _TE_VERSION: branch + a duplicate else branch.
  • TE 2.14 → 2.15: inserted weight_workspace between weight and inp at the _Linear.forward call site (TE 2.15 linear.py L1663). Unhandled by ModelOpt — args[idx + 1] resolved to None (workspace is None outside FP8), which then crashed TensorQuantizer.forward on inputs.numel() with AttributeError: 'NoneType' object has no attribute 'numel'. Surfaced as a regression in Megatron-Bridge after the TE 2.15 bump alongside ModelOpt 0.44.0rc3.
  • TE 2.10: _GroupedLinear.forward's second positional slot was renamed m_splitsnon_tensor_args (tuple wrapping). ModelOpt had a separate Version("2.10") gate for this.

Replace all three version gates with parameter-name introspection of the live _Linear.forward / _GroupedLinear.forward signature. The parameter names (weight, inp, m_splits, non_tensor_args) have been stable across TE 1.x, 2.x, and 2.15+; only their relative positions shift. The new code reads the live signature via inspect.signature(...).parameters, locates weight/inp by name, and mutates only those positions in a list copy of args — everything between (e.g. TE 2.15's weight_workspace) and after passes through verbatim. The dual-branch code in te_quantized_linear_fn collapses to a single path.

Usage

No public API change. PTQ continues to work transparently across all supported TE versions:

import modelopt.torch.quantization as mtq
# Works on TE 1.x, 2.0-2.14, 2.15.x, and 2.16+ — no version flag needed.
mtq.quantize(model, mtq.NVFP4_DEFAULT_CFG, forward_loop)

Testing

Existing TE plugin tests (tests/gpu_megatron/torch/quantization/plugins/test_transformer_engine.py) exercise both the _forward (no-grad calibration) and _apply (grad-enabled training) paths of te_quantized_linear_fn for te.pytorch.Linear — they would have caught the original TE 2.15 regression on a CI matrix entry pinned to TE 2.15. Verified trace correctness across:

TE version _Linear.forward signature _te_linear weight→inp gap _GroupedLinear.forward second slot
1.x (ctx, weight, weight_fp8, inp, …) 1 n/a
2.0–2.14 (ctx, weight, inp, bias, …) 0 m_splits
2.15.x (ctx, weight, weight_workspace, inp, …) 1 non_tensor_args
2.16+ (main) (ctx, weight, inp, bias, fwd_args) 0 non_tensor_args

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: Existing tests sufficient

Additional Information

Triggered by Megatron-Bridge NVIDIA-NeMo/Megatron-Bridge#3783 failing tests after bumping ModelOpt 0.44.0rc2 → 0.44.0rc3 together with a Megatron-LM bump that pulls TE 2.15. ModelOpt rc2 had the same latent bug — it just wasn't exercised until TE 2.15 became the runtime version.

Summary by CodeRabbit

  • Refactor
    • Improved Transformer Engine quantization plugin robustness by using runtime parameter inspection instead of version-based branching, ensuring compatibility across TE versions without requiring manual updates.

Review Change Stack

ModelOpt's te_quantized_linear_fn and te_grouped_quantized_linear_fn read
weight/inp from hard-coded positions in args. TE 2.15 broke this by inserting
weight_workspace between weight and inp at the _Linear.forward call site —
passing None into self.input_quantizer() and crashing with
"AttributeError: 'NoneType' object has no attribute 'numel'" during calibration
(observed in Megatron-Bridge after bumping to modelopt 0.44.0rc3).

Replace the version-gated indexing with parameter-name introspection of the
live _Linear.forward / _GroupedLinear.forward signature. The param names
weight / inp / m_splits / non_tensor_args have been stable across TE 1.x, 2.x,
and 2.15+; the gap between them changes. This also subsumes the old TE < 2.0
branch (`weight, weight_fp8, inputs`) — the same gap-detection handles it
uniformly, so the dual-branch code is replaced with one path.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner May 12, 2026 18:29
@kevalmorabia97 kevalmorabia97 requested a review from cjluo-nv May 12, 2026 18:29
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9ec4b8ff-84a0-4248-a796-e17c0508fa0e

📥 Commits

Reviewing files that changed from the base of the PR and between 9d16ad9 and eeb9b9d.

📒 Files selected for processing (1)
  • modelopt/torch/quantization/plugins/transformer_engine.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/torch/quantization/plugins/transformer_engine.py

📝 Walkthrough

Walkthrough

Replaces TE-version branching with runtime signature inspection (via inspect) in Transformer Engine quantization wrappers to locate and quantize input and weight argument positions before invoking TE functions.

Changes

Transformer Engine Quantization Signature Introspection

Layer / File(s) Summary
Linear quantization signature inspection
modelopt/torch/quantization/plugins/transformer_engine.py
Add inspect import and refactor _QuantTELinear.te_quantized_linear_fn to use inspect.signature() on te_linear._Linear.forward to find weight and inp positions (accounting for context offset), apply weight_quantizer and input_quantizer to those args, and call the TE function with the modified arguments.
Grouped linear quantization signature inspection
modelopt/torch/quantization/plugins/transformer_engine.py
Refactor _QuantTEGroupedLinear.te_grouped_quantized_linear_fn to inspect te_grouped_linear._GroupedLinear.forward to locate inp, detect grouped layout via non_tensor_args or m_splits, compute num_gemms, quantize inp and each grouped weight argument by index, and invoke the TE grouped function.

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: making _Linear argument indexing robust to Transformer Engine signature changes using runtime introspection instead of hard-coded indexing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns detected. PR adds only inspect (stdlib) to read function signatures safely. No unsafe deserialization, eval/exec, hardcoded secrets, or problematic dependencies introduced.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kmorabia/te-plugin-introspect-signatures

Comment @coderabbitai help to get the list of available commands and usage tips.

@kevalmorabia97 kevalmorabia97 requested review from ChenhanYu, kinjalpatel27, realAsma and yueshen2016 and removed request for cjluo-nv May 12, 2026 18:29
@kevalmorabia97
Copy link
Copy Markdown
Collaborator Author

/claude review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-12 20:11 UTC

@claude
Copy link
Copy Markdown

claude Bot commented May 12, 2026

Claude Review Summary

Findings: CRITICAL: 2, IMPORTANT: 0, SUGGESTION: 0

Most impactful finding

The introspection helper reads inspect.signature(te_linear._Linear.forward) (and the _GroupedLinear equivalent), but during the no-grad calibration path those attributes have already been replaced by the functools.partial wrapper created in _ParallelLinear.functionals_to_replace. Per inspect.signature semantics for partials, the resulting parameter list collapses to ['args', 'kwargs'], and the names.index("weight") / sig_params.index("inp") lookups raise ValueError on the first calibration forward.

The grad-enabled path coincidentally works because there the patch swaps apply, leaving forward untouched.

Suggested fix: use the saved original that replace_function writes as "_" + name:

orig_forward = getattr(te_linear._Linear, "_forward", te_linear._Linear.forward)
names = list(inspect.signature(orig_forward).parameters)

(and similarly for _GroupedLinear).

Risk

Elevated. The fix as written is brittle in exactly the path the PR description advertises as covered (_forward / no-grad calibration). Please run tests/gpu_megatron/torch/quantization/plugins/test_transformer_engine.py::test_quantize against a real TE install — any TE version, not just 2.15 — before merging. If that test already passes as-is, I'd like to understand how, because the static analysis above predicts a hard failure on calibration entry. It's possible the test matrix currently skips TE-based tests (no TE available), in which case the regression this PR is meant to fix and the regression it introduces would both go unnoticed until enabled.

Otherwise the introspection approach is a sound replacement for brittle version gates — this is purely about pointing the introspection at the un-patched signature.

Comment thread modelopt/torch/quantization/plugins/transformer_engine.py Outdated
Comment thread modelopt/torch/quantization/plugins/transformer_engine.py
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 43.47826% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.83%. Comparing base (2ce745a) to head (eeb9b9d).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...t/torch/quantization/plugins/transformer_engine.py 43.47% 13 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1473      +/-   ##
==========================================
+ Coverage   76.64%   76.83%   +0.18%     
==========================================
  Files         478      473       -5     
  Lines       51643    51322     -321     
==========================================
- Hits        39582    39432     -150     
+ Misses      12061    11890     -171     
Flag Coverage Δ
examples 41.61% <4.34%> (+0.91%) ⬆️
gpu 59.73% <43.47%> (-0.47%) ⬇️
regression 14.97% <4.34%> (-0.10%) ⬇️
unit 52.49% <4.34%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Inside `te_quantized_linear_fn` / `te_grouped_quantized_linear_fn` we run
*while* `replace_function`'s context is active, so `_Linear.forward` and
`_GroupedLinear.forward` point at the `functools.partial` wrapper that
replaced them. `inspect.signature` of a partial collapses to
`(*args, **kwargs)`, so `names.index("weight")` would raise
`ValueError: 'weight' is not in list` on the first calibration forward.

`replace_function` saves the original at `_Linear._forward` /
`_GroupedLinear._forward` while the patch is active. Look that up
(falling back to the still-untouched `.forward` when only `apply` was
patched). Caught by Claude review on PR #1473.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97
Copy link
Copy Markdown
Collaborator Author

Thanks for the catch — the analysis is correct. inspect.signature(te_linear._Linear.forward) runs while replace_function's context is active, so it inspects the functools.partial wrapper whose signature collapses to (*args, **kwargs). names.index("weight") would raise ValueError on the first calibration forward.

Verified locally with a minimal repro:

import inspect
from functools import partial

class _Linear:
    @staticmethod
    def forward(ctx, weight, weight_workspace, inp, bias, non_tensor_args): pass

_Linear._forward = _Linear.forward
_Linear.forward = partial(lambda *a, **kw: None, _Linear, "_forward", "self")

# Buggy: list(inspect.signature(_Linear.forward).parameters) == ['args', 'kwargs']
# Fixed: list(inspect.signature(getattr(_Linear, "_forward", _Linear.forward)).parameters)
#       == ['ctx', 'weight', 'weight_workspace', 'inp', 'bias', 'non_tensor_args']

Fix in eeb9b9d — uses getattr(te_linear._Linear, "_forward", te_linear._Linear.forward) (and the _GroupedLinear analogue): when the forward patch is active the cached original is at _forward; when apply is patched instead, _forward is absent and forward is itself the original, so the fallback works for both paths.

Re: the test concern — agreed that test_quantize would have caught this had it run with a real TE install; the most likely explanation is the CI environment doesn't have TE available, so the pytest.importorskip("transformer_engine") skips the whole file. A TE-pinned matrix entry is the right follow-up but is out of scope for this PR.

@kevalmorabia97
Copy link
Copy Markdown
Collaborator Author

/claude review

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review passed — no significant issues found.

Traced the introspection logic end-to-end:

  • replace_function caches the original at _<name>, so getattr(_Linear, "_forward", _Linear.forward) correctly resolves to the un-patched forward for both the forward-patched and apply-patched paths (in the latter, _forward is absent and forward itself is still original).
  • ctx_offset arithmetic correctly maps signature-relative parameter positions to the args tuple (ctx is present in args only on the _forward path).
  • list(inspect.signature(...).parameters) includes VAR_POSITIONAL as a regular entry, so index-by-name works for _GroupedLinear's trailing *weights_and_biases too.
  • In-place mutation of specific indices in new_args correctly preserves intermediate positional args (the TE 2.15 weight_workspace fix).

No mode/state/export/compat concerns — internal helper, no checkpoint schema or public API change. Future TE param renames would surface as a loud ValueError from .index(), which is the right failure mode.

LGTM.

@kevalmorabia97 kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label May 12, 2026
Copy link
Copy Markdown
Collaborator

@shengliangxu shengliangxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@kinjalpatel27 kinjalpatel27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevalmorabia97 kevalmorabia97 enabled auto-merge (squash) May 12, 2026 19:20
auto-merge was automatically disabled May 12, 2026 20:10

Branch protection rule check failed

@kevalmorabia97 kevalmorabia97 merged commit 5da4d63 into main May 12, 2026
63 of 65 checks passed
@kevalmorabia97 kevalmorabia97 deleted the kmorabia/te-plugin-introspect-signatures branch May 12, 2026 20:10
kevalmorabia97 added a commit that referenced this pull request May 12, 2026
…ges (#1473)

### What does this PR do?

Type of change: Bug fix

ModelOpt's `te_quantized_linear_fn` and `te_grouped_quantized_linear_fn`
read `weight` / `inp` from hard-coded positions in `args`. Two TE
signature changes broke this scheme:

- **TE 1.x → 2.0:** dropped the legacy `weight_fp8` slot between
`weight` and `inp`. ModelOpt handled this with an `if Version("2.0") <=
_TE_VERSION:` branch + a duplicate else branch.
- **TE 2.14 → 2.15:** inserted `weight_workspace` between `weight` and
`inp` at the `_Linear.forward` call site ([TE 2.15 linear.py
L1663](https://github.com/NVIDIA/TransformerEngine/blob/release_v2.15/transformer_engine/pytorch/module/linear.py#L1663)).
Unhandled by ModelOpt — `args[idx + 1]` resolved to `None` (workspace is
None outside FP8), which then crashed `TensorQuantizer.forward` on
`inputs.numel()` with `AttributeError: 'NoneType' object has no
attribute 'numel'`. Surfaced as a regression in Megatron-Bridge after
the TE 2.15 bump alongside ModelOpt 0.44.0rc3.
- **TE 2.10:** `_GroupedLinear.forward`'s second positional slot was
renamed `m_splits` → `non_tensor_args` (tuple wrapping). ModelOpt had a
separate `Version("2.10")` gate for this.

Replace all three version gates with **parameter-name introspection** of
the live `_Linear.forward` / `_GroupedLinear.forward` signature. The
parameter names (`weight`, `inp`, `m_splits`, `non_tensor_args`) have
been stable across TE 1.x, 2.x, and 2.15+; only their relative positions
shift. The new code reads the live signature via
`inspect.signature(...).parameters`, locates `weight`/`inp` by name, and
mutates only those positions in a list copy of `args` — everything
between (e.g. TE 2.15's `weight_workspace`) and after passes through
verbatim. The dual-branch code in `te_quantized_linear_fn` collapses to
a single path.

### Usage

No public API change. PTQ continues to work transparently across all
supported TE versions:

```python
import modelopt.torch.quantization as mtq
# Works on TE 1.x, 2.0-2.14, 2.15.x, and 2.16+ — no version flag needed.
mtq.quantize(model, mtq.NVFP4_DEFAULT_CFG, forward_loop)
```

### Testing
<!-- Mention how have you tested your change if applicable. -->

Existing TE plugin tests
(`tests/gpu_megatron/torch/quantization/plugins/test_transformer_engine.py`)
exercise both the `_forward` (no-grad calibration) and `_apply`
(grad-enabled training) paths of `te_quantized_linear_fn` for
`te.pytorch.Linear` — they would have caught the original TE 2.15
regression on a CI matrix entry pinned to TE 2.15. Verified trace
correctness across:

| TE version | `_Linear.forward` signature | `_te_linear` weight→inp gap
| `_GroupedLinear.forward` second slot |
|---|---|---|---|
| 1.x | `(ctx, weight, weight_fp8, inp, …)` | 1 | n/a |
| 2.0–2.14 | `(ctx, weight, inp, bias, …)` | 0 | `m_splits` |
| 2.15.x | `(ctx, weight, weight_workspace, inp, …)` | 1 |
`non_tensor_args` |
| 2.16+ (main) | `(ctx, weight, inp, bias, fwd_args)` | 0 |
`non_tensor_args` |

### Before your PR is "*Ready for review*"

Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).

Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).

- Is this change backward compatible?: ✅ <!--- Public API unchanged;
broadens the range of TE versions that work (TE 2.15.x now supported, TE
1.x still supported via the same introspection path). -->
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: N/A <!--- Only
adds a stdlib `inspect` import. -->
- Did you write any new necessary tests?: Existing tests sufficient
<!--- Bug fix is covered by existing `test_transformer_engine.py` for
whatever single TE version CI exercises. A multi-version TE matrix is
the right next step but is out of scope for this PR. -->

### Additional Information
<!-- E.g. related issue. -->
Triggered by Megatron-Bridge
NVIDIA-NeMo/Megatron-Bridge#3783 failing tests
after bumping ModelOpt 0.44.0rc2 → 0.44.0rc3 together with a Megatron-LM
bump that pulls TE 2.15. ModelOpt rc2 had the same latent bug — it just
wasn't exercised until TE 2.15 became the runtime version.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Improved Transformer Engine quantization plugin robustness by using
runtime parameter inspection instead of version-based branching,
ensuring compatibility across TE versions without requiring manual
updates.

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/Model-Optimizer/pull/1473)

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 added the cherry-pick-done Added by bot once PR is cherry-picked to the release branch label May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc cherry-pick-done Added by bot once PR is cherry-picked to the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants