Vulkan/Qualcomm: temporary resolve textures from AlwaysResolveIntoZeroLevelAndLayer can stay alive too long and trigger OOM#65
Open
cabanier wants to merge 1 commit into
Conversation
…elAndLayer` can accumulate and OOM On Qualcomm Vulkan, Dawn's non-zero-array-layer resolve workaround allocates per-pass temporary resolve textures, and their lifetime can extend longer than the submit that used them, causing avoidable memory growth and eventual OOM in workloads like WebXR. `AlwaysResolveIntoZeroLevelAndLayer` exists because Qualcomm Vulkan has a bug resolving into a non-zero mip/layer of an array texture: - `src/dawn/native/vulkan/PhysicalDeviceVk.cpp` - `src/dawn/native/Toggles.cpp` In that path, `RenderPassWorkaroundsHelper` allocates a temporary single-layer 2D resolve texture, resolves into it, and then copies into the real destination. The issue is that these temporary resolve textures can live longer than the submit that actually used them. In workloads that hit this path every frame, this causes unnecessary Vulkan image memory growth and can eventually OOM. I hit this with a WebXR/WebGPU workload on Quest where the XR color target is a 2-layer array texture and the right eye resolve targets `baseArrayLayer = 1`, which reliably triggers this workaround every frame. There is also a separate Qualcomm workaround for empty resolve passes (`VulkanAddWorkToEmptyResolvePass`), but that is not the memory issue here. The memory issue is the lifetime of the temporary resolve texture used by `AlwaysResolveIntoZeroLevelAndLayer`. Proposed fix: track these workaround textures on the command buffer and explicitly `Destroy()` them immediately after `vkQueueSubmit`, before the queue serial advances, so their fenced deletion is tied to the submit that used them rather than to later command-buffer/object teardown timing.
|
👋 Thanks for your contribution! Your PR has been imported to Gerrit. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On Android Qualcomm Vulkan, Dawn enables
AlwaysResolveIntoZeroLevelAndLayerbecause resolving into a non-zero array layer is buggy. In workloads that hit this path every frame, the temporary resolve textures can stay alive longer than the submit that used them, which can accumulate enough Vulkan image memory to cause OOM.Relevant existing comments in Dawn
src/dawn/native/vulkan/PhysicalDeviceVk.cpp:988-996src/dawn/native/Toggles.cpp:60-68AlwaysResolveIntoZeroLevelAndLayerresolves into a temporary single-layer 2D texture first, then copies into the true resolve target.src/dawn/native/Toggles.cpp:705-708VulkanAddWorkToEmptyResolvePassexists to force empty resolve passes to execute on Qualcomm.Problem
RenderPassWorkaroundsHelper::Initialize()allocates a temporary resolve texture whenever the resolve target is a non-zero mip or array layer:src/dawn/native/RenderPassWorkaroundsHelper.cppThat texture is then used as the actual resolve target for the render pass, and copied into the intended destination in the pass end callback.
The issue is lifetime management:
CommandBufferBaseobject after submissionIn my repro, this shows up in a WebXR/WebGPU workload on Quest:
baseArrayLayer = 1AlwaysResolveIntoZeroLevelAndLayerThis is separate from the empty-resolve-pass bug.
VulkanAddWorkToEmptyResolvePassis needed to make the resolve happen at all on Qualcomm, but the memory growth is caused by the lifetime of the temporary resolve texture used byAlwaysResolveIntoZeroLevelAndLayer.Expected behavior
Temporary resolve textures created solely for the workaround should be released as soon as the submit that uses them has been queued, independent of command buffer wrapper lifetime.
Proposed fix
Track these temporary textures from
CommandEncoderintoCommandBufferBase, then in Vulkan:Queue::SubmitImpl()Destroy()on them immediately aftervkQueueSubmitThis makes their Vulkan image/memory eligible for release on the correct serial and avoids depending on command buffer destruction / GC timing.