fix: add exclude_list to @trace_class on high-frequency event classes#1041
Conversation
Addresses a2aproject#1034 — the @trace_class decorator on EventQueue and related classes generates 1500+ spans per LLM streaming session from high-frequency internal methods (enqueue_event, dequeue_event, task_done, etc.). Added exclude_list to @trace_class on: - EventQueueLegacy: excludes enqueue_event, dequeue_event, task_done, is_closed - EventQueueSource: excludes enqueue_event, dequeue_event, task_done, is_closed - EventConsumer: excludes consume_all - InMemoryQueueManager: excludes add, get, create_or_tap This reduces tracing spans from 1500+ to ~53 per session (97% reduction) while preserving useful RequestHandler-level traces. The existing exclude_list mechanism in trace_class is used — no new API surface or breaking changes.
There was a problem hiding this comment.
Code Review
This pull request updates the @trace_class decorators across several event-related classes to exclude high-frequency or redundant methods from telemetry tracing, thereby reducing noise in the server spans. Specifically, EventConsumer, EventQueueLegacy, EventQueueSource, and InMemoryQueueManager now have explicit exclude_list parameters. A suggestion was made to also exclude the tap method in InMemoryQueueManager to maintain consistency with the other excluded management operations.
| @trace_class(kind=SpanKind.SERVER) | ||
| @trace_class( | ||
| kind=SpanKind.SERVER, | ||
| exclude_list=['add', 'get', 'create_or_tap'], |
There was a problem hiding this comment.
The tap method is missing from the exclude_list. To be consistent with the exclusion of add, get, and create_or_tap, and to reduce redundant spans for management operations that are already covered by the underlying queue's traces, tap should also be excluded.
| exclude_list=['add', 'get', 'create_or_tap'], | |
| exclude_list=['add', 'get', 'tap', 'create_or_tap'], |
🧪 Code Coverage (vs
|
Summary
Fixes #1034 —
@trace_class(kind=SpanKind.SERVER)onEventQueueand related classes generates 1500+ spans per LLM streaming session from high-frequency internal methods (enqueue_event,dequeue_event,task_done, etc.).Changes
Added
exclude_listto@trace_classon 4 classes ina2a/server/events/:EventQueueLegacyevent_queue.pyenqueue_event,dequeue_event,task_done,is_closedEventQueueSourceevent_queue_v2.pyenqueue_event,dequeue_event,task_done,is_closedEventConsumerevent_consumer.pyconsume_allInMemoryQueueManagerin_memory_queue_manager.pyadd,get,create_or_tapImpact
RequestHandler-level traces (DefaultRequestHandler,JSONRPCHandler,RESTHandler)exclude_listparameter intrace_class— no new API surface or breaking changesTesting
pytest -k 'event_queue or event_consumer or queue_manager or telemetry')