Update fused quant broadcast logic (#20171) by DrJessop · Pull Request #20171 · pytorch/executorch

DrJessop · 2026-06-10T00:19:10Z

Summary:

Unifies QuantParamsStruct (sas_compiler's central quant-params abstraction) onto a single affine-quantization representation and drops the axis argument from every fused-quant op interface.

Core change (ops.py): scale/zero_point are now either a singleton (per-tensor, auto-expanded internally) or a full-rank tensor whose shape encodes the affine block layout — block_size[i] = tensor.shape[i] // scale.shape[i]. This one representation covers per-tensor, per-channel, per-group, and blockwise uniformly. quantize/dequantize delegate to torch.ops.torchao.(de)quantize_affine. The axis field is removed from QuantParamsStruct, all ~60 op-schema fields, _make_qp, and the _lib.define strings. is_per_tensor/is_per_channel/is_per_group and a new channel_axis() helper are now derived from scale shape (channel_axis() returns 0 if all dims are unary, the single non-unary dim if there's
exactly one, else None).

Fusion (fusion_pass.py, fusion_passes/utils.py): the qparams flat block is 6→5 tuple; the per-channel branch inserts an aten.view to make 1-D scales full-rank [1, …, C, …, 1] so their shape encodes the block layout.

Lowering boundary (graph_utils.py, lower_to_turing_linear.py, lower_to_turing_conv_no_nlu_params.py): Helios ParameterExtraction wants the compact scale form ([K] / [K, num_groups]), but the fused op now carries full-rank scales. New compact_scale_node() squeezes size-1 dims at the lowering boundary; the inserted view_copy folds via ConstantPropPass before extraction. Lowering asserts channel_axis() is not None.

Broadcast fix (fuse_mul_into_linear.py): channel_scale is an activation-space [K] vector (out-features is the trailing dim of the mul constant), whereas the weight scale is now full-rank [K, 1] (out-features at dim 0). Reshape the channel factor to [-1, 1, …] so it broadcasts along the weight scale's channel axis instead of producing a [K, K] outer product. The 1-D bias multiply is unchanged (bias is already [K]).

Misc consumers: quant_absorption.py per-tensor check is now out_scale.numel() == 1; BUCK adds the torchao dep.

Reviewed By: ethansfng

Differential Revision: D108065588

pytorch-bot · 2026-06-10T00:19:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20171

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-06-10T00:19:22Z

@DrJessop has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108065588.

github-actions · 2026-06-10T00:20:07Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Summary: Unifies QuantParamsStruct (sas_compiler's central quant-params abstraction) onto a single affine-quantization representation and drops the axis argument from every fused-quant op interface. Core change (ops.py): scale/zero_point are now either a singleton (per-tensor, auto-expanded internally) or a full-rank tensor whose shape encodes the affine block layout — block_size[i] = tensor.shape[i] // scale.shape[i]. This one representation covers per-tensor, per-channel, per-group, and blockwise uniformly. quantize/dequantize delegate to torch.ops.torchao.(de)quantize_affine. The axis field is removed from QuantParamsStruct, all ~60 op-schema fields, _make_qp, and the _lib.define strings. is_per_tensor/is_per_channel/is_per_group and a new channel_axis() helper are now derived from scale shape (channel_axis() returns 0 if all dims are unary, the single non-unary dim if there's exactly one, else None). Fusion (fusion_pass.py, fusion_passes/utils.py): the qparams flat block is 6→5 tuple; the per-channel branch inserts an aten.view to make 1-D scales full-rank [1, …, C, …, 1] so their shape encodes the block layout. Lowering boundary (graph_utils.py, lower_to_turing_linear.py, lower_to_turing_conv_no_nlu_params.py): Helios ParameterExtraction wants the compact scale form ([K] / [K, num_groups]), but the fused op now carries full-rank scales. New compact_scale_node() squeezes size-1 dims at the lowering boundary; the inserted view_copy folds via ConstantPropPass before extraction. Lowering asserts channel_axis() is not None. Broadcast fix (fuse_mul_into_linear.py): channel_scale is an activation-space [K] vector (out-features is the trailing dim of the mul constant), whereas the weight scale is now full-rank [K, 1] (out-features at dim 0). Reshape the channel factor to [-1, 1, …] so it broadcasts along the weight scale's channel axis instead of producing a [K, K] outer product. The 1-D bias multiply is unchanged (bias is already [K]). Misc consumers: quant_absorption.py per-tensor check is now out_scale.numel() == 1; BUCK adds the torchao dep. Reviewed By: ethansfng Differential Revision: D108065588

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 10, 2026

meta-codesync Bot added the meta-exported label Jun 10, 2026

ethansfng approved these changes Jun 10, 2026

View reviewed changes

meta-codesync Bot changed the title ~~Update fused quant broadcast logic~~ Update fused quant broadcast logic (#20171) Jun 10, 2026

DrJessop force-pushed the export-D108065588 branch from d474ec6 to f6c2241 Compare June 10, 2026 17:10

DrJessop force-pushed the export-D108065588 branch from f6c2241 to d9034b7 Compare June 10, 2026 17:10

DrJessop force-pushed the export-D108065588 branch 2 times, most recently from 968c574 to 4674570 Compare June 10, 2026 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update fused quant broadcast logic (#20171)#20171

Update fused quant broadcast logic (#20171)#20171
DrJessop wants to merge 1 commit into
pytorch:mainfrom
DrJessop:export-D108065588

DrJessop commented Jun 10, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DrJessop commented Jun 10, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20171

Uh oh!

meta-codesync Bot commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DrJessop commented Jun 10, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented Jun 10, 2026 •

edited

Loading

This PR needs a `release notes:` label