[ExecuTorch][WebGPU] Add update_cache op (llama.update_cache) by pytorchbot · Pull Request #20202 · pytorch/executorch

pytorchbot · 2026-06-10T20:43:39Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #20083 by @JulianCloudNTH
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/16/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/16/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/16/orig

@diff-train-skip-merge

Pull Request resolved: #20083 Add `llama.update_cache.default`: an in-place KV-cache write. The shader scatters the new K/V (`[1,S,H,D]`) into the cache (`[1,Cmax,H,D]`) at `dst_offset = input_pos*n_heads*head_dim`, bounds-checked against the cache size. The handler validates shape (batch==1, matching n_heads/head_dim) and sizes the 1D dispatch from the device limit via `WebGPUUtils` before allocating. Mirrors the Vulkan `sdpa_kv_cache_update` reference. The export/delegation test is the follow-up diff stacked directly above. Authored with assistance from Claude. ghstack-source-id: 392019030 @exported-using-ghexport Differential Revision: [D107547308](https://our.internmc.facebook.com/intern/diff/D107547308/)

pytorch-bot · 2026-06-10T20:43:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20202

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-10T20:44:28Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Pull Request resolved: #20084 Tests for `llama.update_cache.default`, stacked on the op diff below. `test/ops/sdpa/test_update_cache.py` lowers the op through `VulkanPartitioner` (asserting it delegates to VulkanBackend) and exports per-case `.pte`s; `test/native/test_update_cache.cpp` runs them on-GPU and checks an integer-exact scatter golden against the returned cache. Coverage mirrors the Vulkan KV-cache test (`VulkanSDPATest`): single-shot writes at varied shapes/offsets, plus a multi-step advancing-input_pos replay that threads the returned cache across steps over the same GQA param sets (incl. llama3 head_dim=128). Comparing the cache directly is stronger than Vulkan, which checks it only indirectly via the SDPA output. Authored with assistance from Claude. ghstack-source-id: 391979582 @exported-using-ghexport Differential Revision: [D107547307](https://our.internmc.facebook.com/intern/diff/D107547307/)

…ymint Pull Request resolved: #20085 Adds the dynamic-scalar (SymInt) mechanism to the WebGPU graph as a standalone enabler, ahead of the SDPA op that consumes it. Mirrors the Vulkan delegate's SymInt = live uniform-buffer design: a `ValueType::SymInt` backed by a 16-byte `Uniform|CopyDst` buffer, `set_symint`/`read_symint`/`symint_buffer` accessors with dirty-tracking, a `SymIntSource` + `add_symint_source`/`update_symints_from_inputs` host-read path, and an `add_resize_hook`/`propagate_resize`/`dispatch_at` recompute plumbing. `WebGPUBackend::execute` calls `propagate_resize` after refreshing the SymInts from the runtime inputs. The `et_vk.select_as_symint` op handler records `out SymInt = x[index]` along a dim at build time. This diff has no in-graph consumer yet — the SDPA op (stacked above) reads the SymInt value via `read_symint()` for dynamic `input_pos`. Building it as its own diff keeps the enabler separate from the op, matching the update_cache → mechanism → SDPA layering. Authored with assistance from Claude. ghstack-source-id: 391979584 @exported-using-ghexport Differential Revision: [D107584280](https://our.internmc.facebook.com/intern/diff/D107584280/)

pytorchbot requested review from kirklandsign and larryliu0820 as code owners June 10, 2026 20:43

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 10, 2026

JulianCloudNTH self-requested a review June 10, 2026 21:26

JulianCloudNTH approved these changes Jun 10, 2026

View reviewed changes

JulianCloudNTH added 3 commits June 10, 2026 14:30

Merge branch 'main' into gh/JulianCloudNTH/16/orig

e856ee3

JulianCloudNTH merged commit 5526971 into main Jun 10, 2026
45 of 46 checks passed

JulianCloudNTH deleted the gh/JulianCloudNTH/16/orig branch June 10, 2026 21:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ExecuTorch][WebGPU] Add update_cache op (llama.update_cache)#20202

[ExecuTorch][WebGPU] Add update_cache op (llama.update_cache)#20202
JulianCloudNTH merged 4 commits into
mainfrom
gh/JulianCloudNTH/16/orig

pytorchbot commented Jun 10, 2026

Uh oh!

pytorch-bot Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pytorchbot commented Jun 10, 2026

Uh oh!

pytorch-bot Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20202

Uh oh!

github-actions Bot commented Jun 10, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot Bot commented Jun 10, 2026 •

edited

Loading

This PR needs a `release notes:` label