[ExecuTorch][WebGPU] SymInt live-scalar mechanism + et_vk.select_as_symint by JulianCloudNTH · Pull Request #20085 · pytorch/executorch

JulianCloudNTH · 2026-06-06T07:14:52Z

Stack from ghstack (oldest at bottom):

Adds the dynamic-scalar (SymInt) mechanism to the WebGPU graph as a standalone enabler, ahead of the SDPA op that consumes it. Mirrors the Vulkan delegate's SymInt = live uniform-buffer design: a ValueType::SymInt backed by a 16-byte Uniform|CopyDst buffer, set_symint/read_symint/symint_buffer accessors with dirty-tracking, a SymIntSource + add_symint_source/update_symints_from_inputs host-read path, and an add_resize_hook/propagate_resize/dispatch_at recompute plumbing. WebGPUBackend::execute calls propagate_resize after refreshing the SymInts from the runtime inputs. The et_vk.select_as_symint op handler records out SymInt = x[index] along a dim at build time.

This diff has no in-graph consumer yet — the SDPA op (stacked above) reads the SymInt value via read_symint() for dynamic input_pos. Building it as its own diff keeps the enabler separate from the op, matching the update_cache → mechanism → SDPA layering.

Authored with assistance from Claude.
@exported-using-ghexport

Differential Revision: D107584280

[ghstack-poisoned]

pytorch-bot · 2026-06-06T07:14:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20085

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 1 Cancelled Job, 15 Pending, 1 Unrelated Failure

As of commit a496796 with merge base f0dff03 ():

NEW FAILURES - The following jobs have failed:

pull / test-lora-linux / linux-job (gh)
RuntimeError: Command docker exec -t ffd04b7985c4ab1e1ecad4cf3e7787251a3271170dc3a008f85d6ae283453fda /exec failed with exit code 1
pull / test-multimodal-linux (gemma3-4b) / linux-job (gh)
RuntimeError: Command docker exec -t 4ae3586c0e6391034a4d0e44ecc7627083b798ece1ecbccc9290680b939959c9 /exec failed with exit code 139
pull / test-qnn-models-linux (mv3) / linux-job (gh)
RuntimeError: Command docker exec -t a397d2dbe45f7b4b207b98ed8cefc27b92ada852a6a95870e77702e257c1818f /exec failed with exit code 92

CANCELLED JOB - The following job was cancelled. Please retry:

pull / test-moshi-linux / linux-job (gh)

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / android / build-android (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-06T07:16:03Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

[ghstack-poisoned]

JulianCloudNTH · 2026-06-09T19:19:12Z

@claude review

claude · 2026-06-09T19:20:07Z

Claude finished @JulianCloudNTH's task in 1m 50s —— View job

Code Review

Reviewed the SymInt live-scalar mechanism and et_vk.select_as_symint handler. Overall this is a clean, well-scoped enabler that faithfully mirrors the Vulkan SymInt design, and the lifetime/dirty-tracking plumbing is sound. A few correctness and robustness points worth addressing:

1. Uncaught exceptions in the `execute()` path (main concern)

update_symints_from_inputs throws std::runtime_error on several conditions ("source tensor is not a graph input", "dim out of range", "index out of range", "unsupported input element size", "empty input tensor"). It is called from WebGPUBackend::execute (WebGPUBackend.cpp:110), which — unlike init()'s build() call (WebGPUBackend.cpp:78-84) — has no try/catch. An uncaught exception will propagate out of the backend execute boundary instead of being converted to an Error. This is especially risky in the Wasm/embedded targets where exception support may be limited or unwind across the runtime boundary.

Recommend wrapping the symint refresh + propagate_resize() in a try/catch that logs and returns Error::Internal (or similar), matching the init() pattern.

Fix this →

2. `elem_size` derived from build-time dims may be wrong under dynamic shapes

In update_symints_from_inputs (WebGPUGraph.cpp), numel is computed from tensors_[src.input_tensor_id].dims (the static build-time dims), then elem_size = inputs[pos].second / numel. If the source input tensor is itself dynamically shaped, the runtime nbytes won't match build-time numel * elem_size, so the derived element size and the offset >= numel bounds check are both based on stale geometry. For the intended input_pos use case this is likely a static scalar-ish tensor, so it works in practice — but it's a latent footgun. Consider deriving the element type from the tensor's datatype (as WebGPUTensor/the flatbuffer already carries) rather than back-computing from nbytes, or at least documenting the static-shape assumption.

3. `select_as_symint` selects a single flat element, not a general gather

offset = index * stride reads the element at [0, …, 0, index, 0, …, 0] (all non-selected dims pinned to 0). That's the correct/only sensible interpretation for a scalar input_pos read, but the op name select may suggest more general semantics to a future reader. A one-line comment clarifying that only the leading element along the other dims is read would help.

4. Minor

First-execute hook initialization: propagate_resize() only runs hooks for SymInts in dirty_symints_. set_symint only marks dirty when the value changes. On the very first execute, if the runtime value equals the build-time value, no hook fires — so any resize hook's dispatch state must already be correctly initialized at build time. Worth confirming this invariant holds when the consuming SDPA op lands (stacked above).
read_symint/symint_buffer error message: these use symints_.at(id), which throws std::out_of_range with no context. The header comment promises "fail-loud (if id is not a SymInt)"; the generic message is technically fail-loud but less actionable than set_symint's explicit message. Low priority.
Linear input lookup: the nested scan over input_ids_ per source runs every execute. Negligible for the expected handful of sources, just noting it.

Things that look good

Buffer lifetime: SymInt UBOs and owned_uniform_buffers_ are released in the dtor; add_uniform_buffer_bytes keeps memory stats accurate.
Dirty-tracking in set_symint correctly avoids redundant wgpuQueueWriteBuffer calls.
16-byte UBO with zeroed padding and int32 in the first 4 bytes matches std140/uniform alignment expectations.
The op handler correctly validates the output is a SymInt before recording the source.

[ghstack-poisoned]

…ymint Pull Request resolved: #20085 Adds the dynamic-scalar (SymInt) mechanism to the WebGPU graph as a standalone enabler, ahead of the SDPA op that consumes it. Mirrors the Vulkan delegate's SymInt = live uniform-buffer design: a `ValueType::SymInt` backed by a 16-byte `Uniform|CopyDst` buffer, `set_symint`/`read_symint`/`symint_buffer` accessors with dirty-tracking, a `SymIntSource` + `add_symint_source`/`update_symints_from_inputs` host-read path, and an `add_resize_hook`/`propagate_resize`/`dispatch_at` recompute plumbing. `WebGPUBackend::execute` calls `propagate_resize` after refreshing the SymInts from the runtime inputs. The `et_vk.select_as_symint` op handler records `out SymInt = x[index]` along a dim at build time. This diff has no in-graph consumer yet — the SDPA op (stacked above) reads the SymInt value via `read_symint()` for dynamic `input_pos`. Building it as its own diff keeps the enabler separate from the op, matching the update_cache → mechanism → SDPA layering. Authored with assistance from Claude. ghstack-source-id: 391979584 @exported-using-ghexport Differential Revision: [D107584280](https://our.internmc.facebook.com/intern/diff/D107584280/)

Update

fb19c0a

[ghstack-poisoned]

JulianCloudNTH requested review from kirklandsign and larryliu0820 as code owners June 6, 2026 07:14

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 6, 2026

JulianCloudNTH closed this Jun 6, 2026

JulianCloudNTH had a problem deploying to cherry-pick-bot June 6, 2026 07:15 — with GitHub Actions Failure

JulianCloudNTH reopened this Jun 9, 2026

Update

39f7fd9

[ghstack-poisoned]

meta-codesync Bot added the meta-exported label Jun 9, 2026

JulianCloudNTH added 6 commits June 8, 2026 22:47

Update

c42e352

[ghstack-poisoned]

Update

fff934e

[ghstack-poisoned]

Update

a247613

[ghstack-poisoned]

Update

8c7bb27

[ghstack-poisoned]

Update

3874835

[ghstack-poisoned]

Update

bb9b3b9

[ghstack-poisoned]

Update

143f05e

[ghstack-poisoned]

JulianCloudNTH mentioned this pull request Jun 9, 2026

[ExecuTorch][WebGPU] GPU timestamp query profiling for SDPA #20167

Open

SS-JIA approved these changes Jun 9, 2026

View reviewed changes

JulianCloudNTH added 2 commits June 9, 2026 17:17

Update

bd1d8df

[ghstack-poisoned]

Update

a496796

[ghstack-poisoned]

JulianCloudNTH mentioned this pull request Jun 10, 2026

[ExecuTorch][WebGPU] GPU timestamp query profiling (general implementation) #20201

Open

meta-codesync Bot merged commit 3caca1c into gh/JulianCloudNTH/18/base Jun 10, 2026
173 of 180 checks passed

meta-codesync Bot deleted the gh/JulianCloudNTH/18/head branch June 10, 2026 21:26

meta-codesync Bot temporarily deployed to cherry-pick-bot June 10, 2026 21:26 Inactive

pytorchbot mentioned this pull request Jun 10, 2026

[ExecuTorch][WebGPU] SymInt live-scalar mechanism + et_vk.select_as_symint #20206

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ExecuTorch][WebGPU] SymInt live-scalar mechanism + et_vk.select_as_symint#20085

[ExecuTorch][WebGPU] SymInt live-scalar mechanism + et_vk.select_as_symint#20085
meta-codesync[bot] merged 11 commits into
gh/JulianCloudNTH/18/basefrom
gh/JulianCloudNTH/18/head

JulianCloudNTH commented Jun 6, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

JulianCloudNTH commented Jun 9, 2026

Uh oh!

claude Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JulianCloudNTH commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20085

❌ 3 New Failures, 1 Cancelled Job, 15 Pending, 1 Unrelated Failure

Uh oh!

github-actions Bot commented Jun 6, 2026

This PR needs a release notes: label

Uh oh!

JulianCloudNTH commented Jun 9, 2026

Uh oh!

claude Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

1. Uncaught exceptions in the execute() path (main concern)

2. elem_size derived from build-time dims may be wrong under dynamic shapes

3. select_as_symint selects a single flat element, not a general gather

4. Minor

Things that look good

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JulianCloudNTH commented Jun 6, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 6, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented Jun 9, 2026 •

edited

Loading

1. Uncaught exceptions in the `execute()` path (main concern)

2. `elem_size` derived from build-time dims may be wrong under dynamic shapes

3. `select_as_symint` selects a single flat element, not a general gather