[ExecuTorch][WebGPU] GPU timestamp query profiling (general implementation)#20201
[ExecuTorch][WebGPU] GPU timestamp query profiling (general implementation)#20201JulianCloudNTH wants to merge 2 commits into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20201
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 13 Pending, 1 Unrelated FailureAs of commit dbcc312 with merge base 5526971 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
@claude review |
|
Claude finished @JulianCloudNTH's task in 2m 2s —— View job Code Review
Overall this is a clean, well-scoped re-port of the Vulkan Correctness / robustness1. const double mean_us = kv.second.first / kv.second.second / 1000.0;
2. Single global query pool is not concurrency-safe — 3. Design / minor4. Heavy include pulled into a widely-used header — 5. Redundant context lookup — 6. Questions
|
Stack from ghstack (oldest at bottom):
Backend-agnostic GPU-timestamp infrastructure, split out so the general implementation is foundational (below SDPA) while the SDPA-specific dispatch labeling stays above the SDPA op. Composed of:
WebGPUQueryPool, a faithful re-port of Vulkan'svkapi::QueryPool(backends/vulkan/runtime/vk_api/QueryPool.{h,cpp}) — sameShaderDurationdata model and ticks->ns conversion; three deviations are forced by the WebGPU API (per-dispatch bracketing via a compute-passtimestampWritesdescriptor since there is no mid-encoderwriteTimestamp; readback viaresolveQuerySet+ buffer map rather than host-sidevkGetQueryPoolResults; theTimestampQuerycapability requested as an explicit device feature, fail-open if the adapter lacks it).WebGPUDevicegains timestamp-feature detection, andWebGPUGraphgains a per-dispatchkernel_namelabel plusexecute()bracketing of each compute pass when the pool is active. Opt-in via theWEBGPU_TIMESTAMP_QUERYenv var; off by default, so the productionexecute()path is byte-identical. The SDPA per-kernel labeling lives in the companion "for SDPA" diff above the SDPA op.Co-authored with Claude.
@exported-using-ghexport
Differential Revision: D108188287
Differential Revision: D108188287