Yield collect-thread spin loops to unblock preempted recorders#27
Merged
jack-berg merged 1 commit intoMay 8, 2026
Conversation
Both spin loops in DeltaSynchronousMetricStorage burned the collect thread's core while waiting for a recorder that may have been preempted mid-section. On constrained CPU (single-CPU containers, pinned cores), the recorder could not be rescheduled until the OS forced a quantum, multiplied per stuck handle. Add Thread.yield() in both loops so the collect thread releases its core. Matches the existing yield in acquireHandleForRecord. Signed-off-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com>
jack-berg
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on top of open-telemetry#8313 — targets your branch directly so it merges in if/when 8313 lands.
Summary
Two spin loops in
DeltaSynchronousMetricStoragebusy-wait without yielding:AggregatorHolder.lockForCollectAndAwait()— waits for in-flight new-series operationsDeltaAggregatorHandle.awaitRecordersAndUnlock()— waits for in-flight recordersRecorder critical sections are short (a
LongAdder.addfor counters, bucket lookup + add for histograms), so on a roomy multi-core box the spin terminates in hundreds of cycles. But if the OS preempts a recorder mid-section, the collect thread holds its core spinning while the recorder cannot be rescheduled. Failure mode shows up on:cpus=1)Worst case stalls each handle by ~one OS quantum (~10ms), multiplied across all in-flight handles in the collection.
Fix
Add
Thread.yield()in both loops. Matches the existingThread.yield()already used inacquireHandleForRecordfor the holder-swap retry, so the suppression and pattern are consistent.Thread.onSpinWait()would be the textbook choice (cheap PAUSE hint, JIT-friendly), but it is Java 9+. The SDK targets Java 8 via--release 8and animalsniffer would reject it.Thread.yield()is heavier but works on Java 8 and actually deschedules — which is what the constrained-CPU case needs anyway.Test plan
SynchronousInstrumentStressTestcontinues to pass (50× repetitions)