Add comprehensive queue health monitoring and observability#3
Merged
offendingcommit merged 11 commits intoupstream-syncfrom Apr 25, 2026
Merged
Add comprehensive queue health monitoring and observability#3offendingcommit merged 11 commits intoupstream-syncfrom
offendingcommit merged 11 commits intoupstream-syncfrom
Conversation
The surprisal observation fetch passed a list directly as the filter
value ({"level": [...]}), which generated invalid SQL (level = ARRAY)
instead of level IN (...). Use the {"in": [...]} operator syntax.
… and Grafana dashboards Add Docker infrastructure for local development with LM Studio as LLM provider, Prometheus metrics collection with custom histograms, Traefik reverse proxy configuration, and Grafana dashboard provisioning. Update SDK session handling and deriver queue management for improved reliability.
…hinking models - Add `cf` provider (Cloudflare AI Gateway) to SupportedProviders and initialize AsyncOpenAI client pointed at CF_GATEWAY_BASE_URL - Route OpenAI embeddings through CF Gateway when LLM_OPENAI_BASE_URL is set - Convert tools to OpenAI format for `cf` provider (was missing from provider list) - Extract thought_signature from OpenAI-compat tool call responses and re-include it when formatting assistant messages for multi-turn replay — fixes 400 INVALID_ARGUMENT from Gemini thinking models via CF Gateway - Preserve thought_signature in _format_assistant_tool_message else branch - Increase DERIVER_MAX_INPUT_TOKENS upper bound (23000 → 200000) to allow higher limits via config
When CF_GATEWAY_AUTH_TOKEN is set, inject cf-aig-authorization header into the custom client so CF Gateway-proxied custom providers (e.g. custom-ollama) authenticate correctly at the gateway layer.
…rides Adds DEDUCTION_PROVIDER/INDUCTION_PROVIDER and matching THINKING_BUDGET_TOKENS settings so deduction and induction specialists can route to a different provider than the main DREAM config. Also propagates thinking_budget_tokens into the LLM call and documents the CF gateway / Gemini thought_signature gotchas in CLAUDE.md.
Allows deployments (e.g. the infra chart) to configure CORS origins via a comma-separated CORS_ORIGINS env var instead of relying on the hardcoded list. Falls back to the previous defaults when unset.
Resolve conflicts between fork-only commits (CF Gateway auth, Gemini thought_signature fix, LM Studio/Prometheus/Traefik stack, dreamer specialist overrides) and upstream's new src/llm/ transport-based abstraction that replaces src/utils/clients.py. Port decisions: - Dropped fork's cf / custom / vllm / groq providers — superseded by the new ModelConfig base_url/api_key override mechanism. - Kept OPENAI_BASE_URL and CF_GATEWAY_AUTH_TOKEN on LLMSettings and wired them into src/llm/registry (default + override OpenAI clients) and src/embedding_client so CF AI Gateway routing survives the refactor. - Ported thought_signature extraction into OpenAIBackend and replay into OpenAIHistoryAdapter so Gemini thinking models via the CF OpenAI-compat route can do multi-turn tool loops without 400ing. - Dropped fork's DEDUCTION_PROVIDER / INDUCTION_PROVIDER and matching THINKING_BUDGET_TOKENS fields — upstream's per-specialist DEDUCTION_MODEL_CONFIG / INDUCTION_MODEL_CONFIG (full ConfiguredModelSettings) is a strict superset. - Kept fork's traefik+prometheus+grafana docker-compose stack; kept upstream's broader docker/ COPY in the Dockerfile.
basedpyright with reportMissingTypeArgument rejected the bare `dict` types in the mock fake_post used by the SDK message-batching test, failing Static Analysis on PR #3. Add `dict[str, Any]` annotations and an explicit return type so CI stays green.
basedpyright's default exit code is non-zero whenever any diagnostics are reported, so the 8 warnings introduced by the fork-only commits were failing the Static Analysis job on PR #3 even though there were no errors. - src/deriver/queue_manager.py: drop `item.created_at is not None` guards. created_at is `Mapped[datetime.datetime]` (non-nullable), so the checks were always True and basedpyright flagged them as reportUnnecessaryComparison. - tests/sdk/test_session.py: factor out the shared mock-response body into a single helper and give the per-branch closures distinct names. This clears reportRedeclaration on `calls` / `fake_post` and lets the `# pyright: ignore` comments target the actual warning (reportPrivateUsage on `_http` / `_async_http_client`) instead of the irrelevant reportAttributeAccessIssue that was flagged as an unnecessary ignore.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds comprehensive Prometheus metrics and Grafana dashboards for monitoring queue health, session activity, and API performance. It also includes infrastructure improvements for local development with Traefik routing and message batching optimizations.
Key Changes
Observability & Metrics
New Prometheus metrics for queue monitoring:
deriver_queue_depth- Queue depth by workspace, task type, and state (pending/in_progress)deriver_queue_oldest_age_seconds- Age of oldest pending/in_progress itemsderiver_queue_error_backlog- Count of errored items retained in queuederiver_queue_errors_total- Total queue processing errorsderiver_queue_item_latency_seconds- Histogram of item latency from enqueue to terminal statederiver_active_workers- Current active worker countapi_request_duration_seconds- API request latency histogramsessions_active,session_last_message_age_seconds,session_queue_depth,session_queue_oldest_age_secondsderiver_queue_items_enqueued,session_context_requests,session_search_requestsQueue health refresh loop in
QueueManager.refresh_queue_health_metrics():Two new Grafana dashboards:
honcho-overview.json- High-level system metrics (API requests, throughput, message creation)honcho-queue-health.json- Detailed queue monitoring (depth, latency, error rates, worker status)Infrastructure & Development
Traefik reverse proxy integration:
traefikservice to docker-compose for request routingdocker/traefik/dynamic.ymlDocker improvements:
SDK & API Enhancements
Message batching in Python SDK:
MAX_MESSAGES_PER_BATCH = 100constantadd_messages()methods now batch large message listsMetrics recording in API routes:
record_api_request_duration()for latency trackingrecord_messages_created()now includessession_namelabelConfiguration & Documentation
Implementation Details
NamespacedGaugeandNamespacedHistogramclasses to automatically inject namespace labelshttps://claude.ai/code/session_01RUjFSXFxCVzym2GV5Ydkx9