Skip to content

[ExecuTorch][WebGPU] SymInt live-scalar mechanism + et_vk.select_as_symint#20085

Merged
meta-codesync[bot] merged 11 commits into
gh/JulianCloudNTH/18/basefrom
gh/JulianCloudNTH/18/head
Jun 10, 2026
Merged

[ExecuTorch][WebGPU] SymInt live-scalar mechanism + et_vk.select_as_symint#20085
meta-codesync[bot] merged 11 commits into
gh/JulianCloudNTH/18/basefrom
gh/JulianCloudNTH/18/head

Conversation

@JulianCloudNTH

@JulianCloudNTH JulianCloudNTH commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Stack from ghstack (oldest at bottom):

Adds the dynamic-scalar (SymInt) mechanism to the WebGPU graph as a standalone enabler, ahead of the SDPA op that consumes it. Mirrors the Vulkan delegate's SymInt = live uniform-buffer design: a ValueType::SymInt backed by a 16-byte Uniform|CopyDst buffer, set_symint/read_symint/symint_buffer accessors with dirty-tracking, a SymIntSource + add_symint_source/update_symints_from_inputs host-read path, and an add_resize_hook/propagate_resize/dispatch_at recompute plumbing. WebGPUBackend::execute calls propagate_resize after refreshing the SymInts from the runtime inputs. The et_vk.select_as_symint op handler records out SymInt = x[index] along a dim at build time.

This diff has no in-graph consumer yet — the SDPA op (stacked above) reads the SymInt value via read_symint() for dynamic input_pos. Building it as its own diff keeps the enabler separate from the op, matching the update_cache → mechanism → SDPA layering.

Authored with assistance from Claude.
@exported-using-ghexport

Differential Revision: D107584280

Differential Revision: D107584280

[ghstack-poisoned]
@pytorch-bot

pytorch-bot Bot commented Jun 6, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20085

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 1 Cancelled Job, 15 Pending, 1 Unrelated Failure

As of commit a496796 with merge base f0dff03 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@JulianCloudNTH JulianCloudNTH reopened this Jun 9, 2026
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@JulianCloudNTH

Copy link
Copy Markdown
Contributor Author

@claude review

@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown

Claude finished @JulianCloudNTH's task in 1m 50s —— View job


Code Review

Reviewed the SymInt live-scalar mechanism and et_vk.select_as_symint handler. Overall this is a clean, well-scoped enabler that faithfully mirrors the Vulkan SymInt design, and the lifetime/dirty-tracking plumbing is sound. A few correctness and robustness points worth addressing:

1. Uncaught exceptions in the execute() path (main concern)

update_symints_from_inputs throws std::runtime_error on several conditions ("source tensor is not a graph input", "dim out of range", "index out of range", "unsupported input element size", "empty input tensor"). It is called from WebGPUBackend::execute (WebGPUBackend.cpp:110), which — unlike init()'s build() call (WebGPUBackend.cpp:78-84) — has no try/catch. An uncaught exception will propagate out of the backend execute boundary instead of being converted to an Error. This is especially risky in the Wasm/embedded targets where exception support may be limited or unwind across the runtime boundary.

Recommend wrapping the symint refresh + propagate_resize() in a try/catch that logs and returns Error::Internal (or similar), matching the init() pattern.

Fix this →

2. elem_size derived from build-time dims may be wrong under dynamic shapes

In update_symints_from_inputs (WebGPUGraph.cpp), numel is computed from tensors_[src.input_tensor_id].dims (the static build-time dims), then elem_size = inputs[pos].second / numel. If the source input tensor is itself dynamically shaped, the runtime nbytes won't match build-time numel * elem_size, so the derived element size and the offset >= numel bounds check are both based on stale geometry. For the intended input_pos use case this is likely a static scalar-ish tensor, so it works in practice — but it's a latent footgun. Consider deriving the element type from the tensor's datatype (as WebGPUTensor/the flatbuffer already carries) rather than back-computing from nbytes, or at least documenting the static-shape assumption.

3. select_as_symint selects a single flat element, not a general gather

offset = index * stride reads the element at [0, …, 0, index, 0, …, 0] (all non-selected dims pinned to 0). That's the correct/only sensible interpretation for a scalar input_pos read, but the op name select may suggest more general semantics to a future reader. A one-line comment clarifying that only the leading element along the other dims is read would help.

4. Minor

  • First-execute hook initialization: propagate_resize() only runs hooks for SymInts in dirty_symints_. set_symint only marks dirty when the value changes. On the very first execute, if the runtime value equals the build-time value, no hook fires — so any resize hook's dispatch state must already be correctly initialized at build time. Worth confirming this invariant holds when the consuming SDPA op lands (stacked above).
  • read_symint/symint_buffer error message: these use symints_.at(id), which throws std::out_of_range with no context. The header comment promises "fail-loud (if id is not a SymInt)"; the generic message is technically fail-loud but less actionable than set_symint's explicit message. Low priority.
  • Linear input lookup: the nested scan over input_ids_ per source runs every execute. Negligible for the expected handful of sources, just noting it.

Things that look good

  • Buffer lifetime: SymInt UBOs and owned_uniform_buffers_ are released in the dtor; add_uniform_buffer_bytes keeps memory stats accurate.
  • Dirty-tracking in set_symint correctly avoids redundant wgpuQueueWriteBuffer calls.
  • 16-byte UBO with zeroed padding and int32 in the first 4 bytes matches std140/uniform alignment expectations.
  • The op handler correctly validates the output is a SymInt before recording the source.

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@meta-codesync meta-codesync Bot merged commit 3caca1c into gh/JulianCloudNTH/18/base Jun 10, 2026
173 of 180 checks passed
@meta-codesync meta-codesync Bot deleted the gh/JulianCloudNTH/18/head branch June 10, 2026 21:26
@meta-codesync meta-codesync Bot temporarily deployed to cherry-pick-bot June 10, 2026 21:26 Inactive
JulianCloudNTH added a commit that referenced this pull request Jun 10, 2026
…ymint

Pull Request resolved: #20085

Adds the dynamic-scalar (SymInt) mechanism to the WebGPU graph as a standalone enabler, ahead of the SDPA op that consumes it. Mirrors the Vulkan delegate's SymInt = live uniform-buffer design: a `ValueType::SymInt` backed by a 16-byte `Uniform|CopyDst` buffer, `set_symint`/`read_symint`/`symint_buffer` accessors with dirty-tracking, a `SymIntSource` + `add_symint_source`/`update_symints_from_inputs` host-read path, and an `add_resize_hook`/`propagate_resize`/`dispatch_at` recompute plumbing. `WebGPUBackend::execute` calls `propagate_resize` after refreshing the SymInts from the runtime inputs. The `et_vk.select_as_symint` op handler records `out SymInt = x[index]` along a dim at build time.

This diff has no in-graph consumer yet — the SDPA op (stacked above) reads the SymInt value via `read_symint()` for dynamic `input_pos`. Building it as its own diff keeps the enabler separate from the op, matching the update_cache → mechanism → SDPA layering.

Authored with assistance from Claude.
ghstack-source-id: 391979584
@exported-using-ghexport

Differential Revision: [D107584280](https://our.internmc.facebook.com/intern/diff/D107584280/)
JulianCloudNTH added a commit that referenced this pull request Jun 10, 2026
…ymint

Pull Request resolved: #20085

Adds the dynamic-scalar (SymInt) mechanism to the WebGPU graph as a standalone enabler, ahead of the SDPA op that consumes it. Mirrors the Vulkan delegate's SymInt = live uniform-buffer design: a `ValueType::SymInt` backed by a 16-byte `Uniform|CopyDst` buffer, `set_symint`/`read_symint`/`symint_buffer` accessors with dirty-tracking, a `SymIntSource` + `add_symint_source`/`update_symints_from_inputs` host-read path, and an `add_resize_hook`/`propagate_resize`/`dispatch_at` recompute plumbing. `WebGPUBackend::execute` calls `propagate_resize` after refreshing the SymInts from the runtime inputs. The `et_vk.select_as_symint` op handler records `out SymInt = x[index]` along a dim at build time.

This diff has no in-graph consumer yet — the SDPA op (stacked above) reads the SymInt value via `read_symint()` for dynamic `input_pos`. Building it as its own diff keeps the enabler separate from the op, matching the update_cache → mechanism → SDPA layering.

Authored with assistance from Claude.
ghstack-source-id: 391979584
@exported-using-ghexport

Differential Revision: [D107584280](https://our.internmc.facebook.com/intern/diff/D107584280/)
JulianCloudNTH added a commit that referenced this pull request Jun 10, 2026
…ymint

Pull Request resolved: #20085

Adds the dynamic-scalar (SymInt) mechanism to the WebGPU graph as a standalone enabler, ahead of the SDPA op that consumes it. Mirrors the Vulkan delegate's SymInt = live uniform-buffer design: a `ValueType::SymInt` backed by a 16-byte `Uniform|CopyDst` buffer, `set_symint`/`read_symint`/`symint_buffer` accessors with dirty-tracking, a `SymIntSource` + `add_symint_source`/`update_symints_from_inputs` host-read path, and an `add_resize_hook`/`propagate_resize`/`dispatch_at` recompute plumbing. `WebGPUBackend::execute` calls `propagate_resize` after refreshing the SymInts from the runtime inputs. The `et_vk.select_as_symint` op handler records `out SymInt = x[index]` along a dim at build time.

This diff has no in-graph consumer yet — the SDPA op (stacked above) reads the SymInt value via `read_symint()` for dynamic `input_pos`. Building it as its own diff keeps the enabler separate from the op, matching the update_cache → mechanism → SDPA layering.

Authored with assistance from Claude.
ghstack-source-id: 391979584
@exported-using-ghexport

Differential Revision: [D107584280](https://our.internmc.facebook.com/intern/diff/D107584280/)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants