chore(sync): merge upstream/main (42 commits)#4
Merged
offendingcommit merged 50 commits intomainfrom May 4, 2026
Merged
Conversation
New integration guide for the sillytavern-honcho extension covering install, global config, context architecture, enrichment modes, and troubleshooting. Added to v3 integrations nav.
…#495) * chore: add .worktrees/ to .gitignore * feat(examples): add Zo Computer memory skill integration * feat(examples): add Zo Computer memory skill integration * fix(examples): address CodeRabbit review on Zo skill integration - Fix version inconsistency: SKILL.md matches pyproject.toml (>=2.1.0) - Move client.py into tools/ package and use relative imports - Add assistant_id parameter to save_memory() for consistency with get_context() - Use UUID-based IDs in tests to prevent state leakage between runs - Add pytest.mark.skipif guard on integration tests (requires HONCHO_API_KEY) - Fix import ordering, move pytest to module level, sort __all__ alphabetically - Fix markdown blank lines around fenced code blocks (MD031) - Add rate limit delay fixture to avoid hitting Honcho free tier limits * fix(examples): validate HONCHO_API_KEY early in client initialization * docs(examples): note cross-peer memory behavior in shared workspaces * docs(examples): fix save_memory and query_memory signatures in README * docs(examples): fix markdown linting issues in README * docs(examples): add assistant_id parameter to save_memory example in SKILL.md --------- Co-authored-by: Luba Kaper <lubakaper@lubas-air.mynetworksettings.com>
…fig guide (plastic-labs#510) * fix: Inconsistencies in Docs, health endpoint, troubleshooting guide * fix: (docs) maintain consistency on postgres db name * chore: (docs) update v2 contributing docs with updates db paths * docs: overhaul self-hosting docs for provider-agnostic setup - .env.template: lead with provider options (custom, vllm, google, anthropic, openai, groq) instead of baking in vendor-specific keys. All provider/model settings commented out so server fails fast until configured. Separate endpoint config from per-feature provider+model from tuning knobs. - docker-compose.yml.example: fix healthcheck -d honcho -> -d postgres to match POSTGRES_DB=postgres. - config.toml.example: reorder and document LLM key section with OpenRouter and vLLM examples. - self-hosting.mdx: replace multi-vendor key table with provider options table. Add examples for OpenRouter, vLLM/Ollama, and direct vendor keys. Remove duplicated key lists from Docker/manual setup sections. - configuration.mdx: replace scattered provider docs with provider types table. Fix Docker Compose snippet to match actual compose file. Note code defaults as fallback, not recommended path. - troubleshooting.mdx: add alternative provider issues section (custom provider config, model name format, Docker localhost, structured output failures). * docs: add Docker build troubleshooting for permission errors - Document BuildKit requirement (RUN --mount syntax) - AppArmor/SELinux blocking Docker builds on Linux - Volume mount UID mismatch between host and container app user - Note in self-hosting docs that Docker path builds from source * docs: reframe self-hosting as contributor/dev path, point to cloud service * Revert "docs: reframe self-hosting as contributor/dev path, point to cloud service" This reverts commit 3e766eb. * docs: add production compose, model guidance, thinking budget docs - Add docker-compose.prod.yml for VM/server deployment: no source mounts, restart policies, 127.0.0.1-bound ports, cache enabled - Add model tier guidance and community quick-start link to self-hosting - Document THINKING_BUDGET_TOKENS gotcha for non-Anthropic providers - Add reverse proxy examples (Caddy + nginx) to production section - Add backup/restore commands to production considerations * docs: simplify self-hosting to single provider, restructure config guide Self-hosting page now defaults to one OpenAI-compatible endpoint with one model for all features. Moved model tiers, alternative providers, and per-feature tuning into the configuration guide. Eliminated duplicate config priority sections, dev/prod split, and redundant TOML examples. * docs: merge compose files, restore provider/model to feature sections in .env.template Single docker-compose.yml.example with dev sections commented out. Moved PROVIDER and MODEL back alongside each feature in .env.template so settings stay colocated with their module. Updated self-hosting docs to reference single compose file. * fix: broken anchor links, redundant migration step, minor inconsistencies Fix 4 broken internal links (#llm-provider-setup, #llm-api-keys, #which-api-keys-do-i-need, #alternative-providers) to point to correct headings. Remove redundant Docker migration step (entrypoint already runs alembic). Fix cache URL missing ?suppress=true in reference config. Fix uv install command to use official method. * docs: env template ready to use, simplify self-hosting flow .env.template now has provider/model lines uncommented with placeholder values — user just sets endpoint, key, and model name. Thinking budgets default to 0 for non-Anthropic providers. Self-hosting page: removed 30-line env var wall, LLM setup now points to the template. Merged duplicate verify sections. Removed api_key from SDK examples (auth off by default). * docs: reorder next steps, configuration guide first * fix: default embedding provider to openrouter for single-endpoint setup Without this, embeddings default to openai which requires a separate LLM_OPENAI_API_KEY. Setting to openrouter routes embeddings through the same OpenAI-compatible endpoint as everything else. * fix: review issues — hermes page, thinking budget, production wording Hermes integration page: replaced inline Docker/manual setup with link to self-hosting guide, added elkimek community link. Removed old env var names (OPENAI_API_KEY without LLM_ prefix). Troubleshooting: removed "or 1" from thinking budget guidance. Self-hosting: softened "production-ready" to "production-oriented" since auth is disabled by default. * docs: model examples in template, expanded LLM setup, better verify flow .env.template: added "e.g. google/gemini-2.5-flash" hints next to model placeholders so users know the expected format. Self-hosting: expanded LLM Setup to show the 3 things users need to set (endpoint, key, model name) with find-replace tip. Added build time note, deriver log check, and real smoke test (create workspace) to verify section. Health check now notes it doesn't verify DB/LLM. * fix: smoke test uses v3 API path, not v1 * docs: clarify deriver metrics port vs Prometheus host port * fix: remove deprecated memoryMode from hermes config example * docs: update hermes page to match current memory provider config Updated config to match hermes-agent docs: removed apiKey (not needed for self-hosted), added hermes memory setup CLI command, added config fields table (recallMode, writeFrequency, sessionStrategy, etc.). Better verification tests: store-and-recall across sessions, direct tool calling test. Links to upstream hermes docs for full field list. * fix: invalid THINKING_BUDGET_TOKENS=0 and missing docker/ in image Comment out THINKING_BUDGET_TOKENS=0 in .env.template — deriver, summary, and dream validators require gt=0. Dialectic levels also commented out since non-thinking models don't need the override. Add COPY for docker/ directory in Dockerfile so entrypoint.sh is available when docker-compose.yml.example references it. * chore: Additional troubleshooting step --------- Co-authored-by: Vineeth Voruganti <13438633+VVoruganti@users.noreply.github.com>
* fix: further remove extraneous transactions * fix: (search) use 2 phase function to reduce un-needed transaction * fix: refactor agent search to perform external operations before making a transaction * fix: reduce scope of queue manager transaction * fix: (bench) add concurrency to test bench * fix: address review findings for search dedup, webhook idempotency, and bench throttling * Fix Leakage in non-session-scoped chat call (plastic-labs#526) * fix: (search) reduce scope for peer based searches * fix: tests * fix: (test) address coderabbit comment * fix: drop db param from deliver_webhook --------- Co-authored-by: Rajat Ahuja <rahuja445@gmail.com>
* chore: (docs) Update changelogs and version numbers * chore: remove extraneous dep on mintlify
* Simplify Paperclip integration instructions Clarified instructions for local Honcho setup and removed unnecessary details. * Update docs.json * Update links in Paperclip integration guide * Revise memory initialization instructions in Paperclip guide Updated instructions for initializing memory and removed optional checks section.
…ic-labs#530) The HEALTHCHECK directive probes an HTTP endpoint that only the API serves. The deriver service reuses this image but is a background queue worker with no HTTP server — the probe can never succeed, so Docker permanently marks the deriver container as unhealthy. Remove the HEALTHCHECK from the shared image. Service-level health checks belong in each service's own configuration (e.g. Kubernetes readiness/liveness probes on the API Deployment only). Closes plastic-labs#521 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ation (plastic-labs#459) * fix: Add JSON repair for truncated LLM responses across all providers and Gemini thinking budget support LengthFinishReasonError from OpenAI-compatible providers (custom, openai, groq) was crashing the deriver with 14k+ occurrences in production. The vLLM path already had repair logic but it was gated on provider=="vllm", unreachable when routing through litellm as a custom provider. - Extract shared _repair_response_model_json() helper for all providers - Catch LengthFinishReasonError in OpenAI/custom parse() path and repair truncated JSON - Add repair fallback to Anthropic and Gemini response_model paths - Add repair fallback to Groq response_model path - Pass thinking_budget_tokens to Gemini 2.5 models via thinking_config - Add 14 tests covering repair paths for all providers and Gemini thinking budget Fixes HONCHO-YC Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: live llm integration tests * feat: Consistent Model Config Protocol * fix: migrate the remaining app callers off the legacy llm_settings path * fix: Docs and regression tests * fix: refactor llm runtime path to model-config-only API * fix: refactor config to nested model-config source of truth * fix: refactor llm streaming and tool dispatch through backends * fix: cut over llm config to nested model_config only * fix: collapse vllm and custom into openai_compatible transport * feat: refactor llm config to explicit transports and bare model ids * feat: (embed) Add configurability for embedding model * fix: tests for embedding provider * fix: Address Review Comments * fix: (llm) remove Groq backend and per-vendor base URLs * chore: move llm tests * fix: (llm) address review findings — config regressions, backend bugs, dead code * fix: address backend end silly errors * chore: (docs) update configuration and self-hosting guides * chore: fix tests * fix: address code rabbit comments * fix: add validation to the dream settings * fix: further address code rabbit comments * fix: Address Code Rabbit Comments * fix: Another round of code rabbit * fix: Address Code Rabbit Nits * fix: tests * refactor: rename thinking validator to reflect transport scope _validate_anthropic_thinking_minimum only enforces the >=1024 rule for Anthropic and no-ops for other transports, so the name was misleading now that it's shared across ConfiguredModelSettings, FallbackModelSettings, and ModelConfig. Renamed to _validate_thinking_constraints with a docstring clarifying per-transport behavior. No logic change. * fix(config): drop transport-specific thinking params when env override changes transport _fill_defaults_for_nested_field previously preserved the default MODEL_CONFIG's thinking_budget_tokens/thinking_effort across a transport override. This leaked Gemini-family defaults (e.g. thinking_budget_tokens=1024) into OpenAI-transport overrides, and the OpenAI backend then correctly rejected the unsupported param at call time (OpenAI uses reasoning.effort, not a token budget). The helper now strips thinking_budget_tokens and thinking_effort from the default dict when the env override supplies a transport different from the default's. Explicit thinking params in the override are preserved. * fix(config): apply thinking-param strip to dialectic level merge too DialecticSettings._merge_level_defaults does its own inline MODEL_CONFIG merge (parallel to _fill_defaults_for_nested_field), so the previous fix missed dialectic-level overrides. E.g. flipping DIALECTIC_LEVELS__minimal__MODEL_CONFIG__TRANSPORT from gemini (default) to openai still leaked the default thinking_budget_tokens=0 into the openai config, which the OpenAI backend then rejected at call time. The level-merge path now applies the same 'strip transport-specific thinking params when transport changes' rule as the generic helper. Added a regression test exercising the merge validator directly. * refactor(llm): wire ModelConfig knobs through, prune clients.py migration leftovers Three connected fixes to finish carving the LLM stack out of src/utils/clients.py and into src/llm/: 1. Propagate ModelConfig tuning knobs into backend calls. honcho_llm_call_inner built extra_params from only {json_mode, verbosity}, silently dropping top_p, top_k, frequency_penalty, presence_penalty, seed, and operator-supplied provider_params from any ModelConfig. Thread the selected config through ProviderSelection and merge build_config_extra_params(selected_config) into extra_params; per-call kwargs still win over provider_params defaults. Makes _build_config_extra_params public as build_config_extra_params so clients.py and request_builder.py share one translation. Adds TestModelConfigExtraParamsPropagation covering OpenAI/Anthropic knob propagation, provider_params passthrough, and per-call override precedence. 2. Drop dead extract_openai_* duplicates in clients.py. extract_openai_reasoning_content, extract_openai_reasoning_details, and extract_openai_cache_tokens had no callers outside their own definitions — the live implementations live in src/llm/backends/openai.py. -103 lines from clients.py. 3. Unify on ModelTransport, delete SupportedProviders. The "google" vs "gemini" split forced a _provider_for_model_config translation shim in two places. Replace all SupportedProviders usages with ModelTransport, rename CLIENTS["google"] → CLIENTS["gemini"], update provider branches + LLMError labels + reasoning-trace entries accordingly. Trace JSONL now writes "provider": "gemini" instead of "google" — consistent with the broader env-var rename cutover. Also tidies up pre-existing basedpyright findings in tests/llm/test_model_config.py (pydantic before-validator dict inputs + descriptor-proxy call). ruff: clean. basedpyright: 0 errors, 0 warnings. Tests: 153/153 pass across tests/utils/test_clients.py, tests/utils/test_length_finish_reason.py, tests/llm/, tests/dialectic/, tests/deriver/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(llm): finish the src/utils/clients.py → src/llm/ migration honcho_llm_call_inner now delegates to request_builder.execute_completion and execute_stream instead of re-implementing backend call scaffolding inline. The new _effective_config_for_call helper carries per-call kwargs (temperature, stop_seqs, thinking_budget_tokens, reasoning_effort) onto the selected ModelConfig — or synthesizes a minimal config for the test-only callers that pass provider+model directly. max_output_tokens is zeroed on the effective config to preserve the current "per-call max_tokens wins" semantic; honoring ModelConfig.max_output_tokens is a separable correctness concern. Side effect of routing through the new path: ConfiguredModelSettings' thinking_budget_tokens validator now fires on synthesized configs. test_anthropic_thinking_budget was asserting that a sub-1024 budget propagated to Anthropic — bumped to 1024 to match what Anthropic actually accepts. Unified client construction. Promoted the cached client factories in src/llm/__init__.py (get_anthropic_client, get_openai_client, get_gemini_client, get_{anthropic,openai,gemini}_override_client) to public API and added them to __all__. Promoted credentials._default_transport_api_key → default_transport_api_key. Deleted the duplicate _build_client and _default_credentials_for_provider from clients.py; _client_for_model_config now falls through to the public factories. CLIENTS dict and _get_backend_for_provider stay as the mockable seam for the ~50 patch.dict(CLIENTS, {...}) test call sites. Wired operator-configurable Gemini cached-content reuse end-to-end. PromptCachePolicy moved from src/llm/caching.py into src/config.py so ModelConfig can reference it as a field without a circular import; caching.py re-exports the name for existing imports. Added cache_policy: PromptCachePolicy | None on ConfiguredModelSettings, FallbackModelSettings, ResolvedFallbackConfig, and ModelConfig. resolve_model_config, _resolve_fallback_config, and _select_model_config_for_attempt copy the field through. honcho_llm_call_inner passes effective_config.cache_policy into execute_completion / execute_stream, so operators opt in via e.g. DERIVER_MODEL_CONFIG__CACHE_POLICY__MODE=gemini_cached_content and the selection actually fires instead of sitting on a dead path. New regression test test_cache_policy_reaches_gemini_backend asserts the PromptCachePolicy object reaches the Gemini backend's extra_params. ruff + basedpyright: clean. Tests: 154/154 pass across tests/utils/test_clients.py, tests/utils/test_length_finish_reason.py, tests/llm/, tests/dialectic/, tests/deriver/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(llm): move all LLM orchestration into src/llm/ and delete clients.py The 1624-line src/utils/clients.py has been carved up into focused modules under src/llm/ and deleted. There is now one golden path for LLM orchestration and no dual entrypoint. New module layout: src/llm/ __init__.py thin stable re-export surface api.py public honcho_llm_call with retry + fallback + tool loop delegation executor.py honcho_llm_call_inner (single-call executor); bridges to request_builder.execute_completion / execute_stream tool_loop.py execute_tool_loop + stream_final_response, plus assistant-tool-message and tool-result formatting runtime.py AttemptPlan dataclass (replaces the loose ProviderSelection NamedTuple), effective_config_for_call, plan_attempt, per-retry temperature bump, attempt ContextVar registry.py single owner of CLIENTS dict + cached default and override SDK-client factories + backend/history-adapter selection + high-level get_backend(config) conversation.py count_message_tokens, tool-aware message grouping, truncate_messages_to_fit types.py HonchoLLMCallResponse, HonchoLLMCallStreamChunk, StreamingResponseWithMetadata, IterationData, IterationCallback, ReasoningEffortType, VerbosityType, ProviderClient request_builder.py low-level request assembly (ModelConfig → backend complete/stream); no longer owns credential resolution credentials.py default_transport_api_key, resolve_credentials caching.py gemini_cache_store; re-exports PromptCachePolicy from src.config backend.py Protocol + normalized result types history_adapters.py provider-specific assistant/tool message shapes structured_output.py backends/ AnthropicBackend, OpenAIBackend, GeminiBackend handle_streaming_response had no production callers; it is deleted. The three tests that used it now drive honcho_llm_call_inner(stream=True, client_override=...) directly, which exercises the same code path the public API uses. Dead credential passthrough removed. The ProviderBackend Protocol and all three concrete backends no longer accept api_key / api_base — those are baked into the underlying SDK client at registry construction time and were being del'd everywhere they appeared. request_builder also stops resolving and forwarding them. Client construction is unified. The cached default-client factories (get_anthropic_client, get_openai_client, get_gemini_client) and override factories (get_*_override_client) are promoted to public API; the module-level CLIENTS dict populates from them and remains the patch.dict(CLIENTS, {...}) mocking seam tests rely on. Old duplicate helpers (_build_client, _default_credentials_for_provider) are gone. default_transport_api_key is promoted to public. Application imports now come from src.llm (dreamer, dialectic, deriver, summarizer, telemetry-adjacent tests). No code imports from src.utils.clients anywhere in the repo. ruff: clean. basedpyright: 0 errors, 0 warnings. Tests: 1013/1013 pass across the entire non-infra test suite (excluding tests/unified, tests/bench, tests/live_llm, tests/alembic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(llm): sanitize tool schemas for Gemini's function_declarations validator Gemini's native-transport function-declarations validator accepts a narrow subset of JSON-Schema / OpenAPI: type, format, description, nullable, enum, properties, required, items, minItems, maxItems, minimum, maximum, title. Anything else — additionalProperties, allOf, if/then/else, $ref, anyOf, oneOf, $defs, patternProperties — triggers an INVALID_ARGUMENT 400 at call time. Our agent tool schemas in src/utils/agent_tools.py use several of those (additionalProperties: false, allOf + if/then conditionals) because they were authored for OpenAI strict-mode + Anthropic, which need the richer vocabulary. GeminiBackend._convert_tools was passing them straight through. Add _sanitize_schema(): walks the parameters tree and drops unsupported keywords while preserving semantics for the keywords that hold user data (properties maps field-name → sub-schema; required / enum are lists of literals; items is a single sub-schema). Other backends are untouched and continue to receive the full strict schemas. Regression tests: - test_gemini_sanitize_schema_strips_unsupported_keywords: confirms additionalProperties, allOf + if/then, and $defs are stripped at nested levels while legitimate fields survive. - test_gemini_convert_tools_sanitizes_parameters_schema: end-to-end _convert_tools output has no forbidden keys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: fix tool calling syntax for gemini * refactor(llm): normalize defaults, widen OpenAI reasoning-model routing * chore: fix test * fix(llm): address post-migration review feedback * fix(llm): gemini robustness + dreamer specialist ergonomics * chore: addres review comments * chore: (docs) unrelease changelog addition * chore: (docs) merge commit changes --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Erosika <eri@plasticlabs.ai>
* feat: adding honcho-cli package * feat: adding more support for command-level flags, also including workarounds for getting raw SDK info * feat: adding peer config * feat: adding setup commands * chore: setting up package dependencies for cli * feat: promote init/doctor to top-level + polish wizard * feat: make init --yes fall back to existing config * chore: updating documentation * chore: updating tagline * feat: structurally updating recomended settings for CLI * fix: style * fix: removing redundant describe method * fix: delete key generation commands and fixing session ID * fix: removing defaults and changing config write path. * chore: pagnating conclusions * chore: require workspace * fix: polish command surfaces — scoping, validation, perf, consistency * chore: removing session message * fix: CLI output shape, destructive-confirm previews, skip needless round-trips * chore: CLI polish — peer inspect config, drop dead helper, doc/help consistency * chore: update readme * chore: updating tests * chore: doc updates * fix: config command * chore: unused code * fix: doctor command * fix: removing quiet tag and fixing session key ordering * fix: config commands and session id command * fix: removing message_count * fix: branding circular dependency * fix: refactor lazy imports to use common.py correctly. * fix: removing all lazy imports * chore: cr fixes * fix: config, env, flag setup * chore: updating skill * feat: adding workspace, session, and message create * fix: init now supports local honcho * chore: cr * feat(cli): CLI surface polish — reasoning flag, peer-scoped messages, help sync Add --reasoning/-r to peer chat (minimal..max), -p peer filter to message list with newest-first ordering, and a curated welcome panel with getting-started/memory/commands sections. Sync the welcome panel and group help strings with the actual registered commands — drop phantom 'session clone', add the 4 missing peer commands and 7 missing session commands, fix conclusion/message/ workspace group docstrings that claimed commands that don't exist. * feat(cli): themed, unified help system with pattern/example Replace the hand-rolled welcome with a layered system: - Theme typer.rich_utils (dim borders, brand color) so every --help inherits the voice. - HonchoTyperGroup subclass renders a curated 3-panel welcome (getting started / memory / commands) with recipes Typer can't auto-generate. - Unify the front door: bare 'honcho', 'honcho --help', and 'honcho help' all render the same welcome via one code path; sub-groups and leaf commands still get Typer's themed renderer. - Replace Click's 'Usage: …' line with pattern/example rows at every sub-group and leaf command, so the help voice stays consistent from top to leaves. * refactor(cli): address review — typed exceptions, chmod 600, tighter redaction, class-based help, tests - Replace module-level monkey-patch of TyperGroup/TyperCommand.get_usage with HonchoTyperGroup applied via cls= on every sub-Typer. Lives in a new _help.py module to avoid circular imports. No longer leaks behavior changes into other Typer users in the same process. - _test_connection dispatches on the SDK's typed exceptions (AuthenticationError, ConnectionError, TimeoutError, APIError) instead of substring-matching error messages. - Config.save() now chmods ~/.honcho/config.json to 0o600 after write so the plaintext API key isn't world-readable on multi-user hosts. - Tighten api_key redaction to '***<last4>' (was 'header...last4'), matching setup._redact for consistency. Short keys fully masked. - Add test_validation.py covering safe IDs, unsafe chars, path traversal, and empty input. Update test_config.py redaction cases and add 0o600 permission assertion. Fix stale patch paths in test_commands.py that pointed at honcho_cli.main instead of the command modules where get_client is actually imported. * feat(cli): add options panel to welcome menu Append a fourth panel listing the global flags (-w/-p/-s, --json, --version, --help) with their env-var counterparts. Discoverable from bare 'honcho' without needing to hunt for --help. * chore(cli): drop --version from welcome options panel * feat(cli): add pixel-honcho icon to banner Prepend a 13-char ASCII rendering of honcho-pixel.svg to the HONCHO wordmark. Uses Unicode half-blocks to pack 12 pixel rows into 6 text rows, faithfully preserving the SVG outline (two eye dots, mouth slit, tapering foot). Appears in bare 'honcho', 'honcho --help', 'honcho --version', and 'honcho init'. * fix: polish Honcho CLI wolcome panel and error messages * fix: honcho workspace inspect speed * chore: minor fix to session pagination * fix: removing NDJSON output * chore: consolidating honcho CLI's dula argv grammar onto Pattern A (command-first) * chore: clean up imports * fix: four `-s` consistency fixes applied * chore: minor changes to memory rows * fix: changing package name to honcho-cli * fix: removing pixel face --------- Co-authored-by: Erosika <eri@plasticlabs.ai>
…lastic-labs#575) The MCP Worker hardcoded https://api.honcho.dev for every request, forcing anyone running a self-hosted Honcho instance to patch the source before deploying their own Worker alongside it. Route the baseUrl through the Worker env so operators can set HONCHO_API_URL (via .dev.vars for local development or wrangler secret for deployed Workers) and point the Worker at their instance. The variable is intentionally not exposed as a request header: that would let public clients steer traffic to internal URLs, which is a latency and security regression. When HONCHO_API_URL is unset, the Worker falls back to https://api.honcho.dev, so existing deployments are unaffected. Closes plastic-labs#508
…patible providers (plastic-labs#586) * fix: wrap single embed() input in array for OpenAI-compatible provider compatibility * Fix input format in embedding test assertion
* fix: catch InternalServerError from turbopuffer * fix: remove unused VectorUpsertResult * fix: downgrade vector store sync errors to warnings * fix: remove upsert_with_retry * fix: (vector) add silent path and explicit path for vector db server errors --------- Co-authored-by: Vineeth Voruganti <13438633+VVoruganti@users.noreply.github.com>
* docs: adding cli doc * docs: adding generated script and content and github workflow * chore: removing workflow * fix: (docs) re-format and add details to cli-reference docs --------- Co-authored-by: Vineeth Voruganti <13438633+VVoruganti@users.noreply.github.com>
* fix: moving cli skills to root * chore: updating cli readme * chore: updating language * chore: updating docs
Applies eight review findings from the DEV-1482 integration review. All scoped to docs/v3/guides/integrations/sillytavern.mdx; no code changes. - DOC-3: curl -fsSL in install command (fails loud on 4xx/5xx) - DOC-4: Note now reflects installer auto-config + manual-fallback - DOC-6: LLM-backend prerequisite callout at top of Quick Start - DOC-14: restart step warns about live-session clobbering - DOC-5: Global Config intro names resolution order + precedence; disambiguates "sillytavern" workspace vs hosts.sillytavern key - DOC-7: new Peer Observability subsection (asymmetric default) - DOC-2: route count in Architecture diagram 7 → 9 - DOC-8: troubleshooting row for "plugin on disk, drawer absent" Findings index + rationale: plastic-labs/sillytavern-honcho#3
- Clarify installer step 4 — the plugin seeds config.json if absent - 'Puzzle piece' -> 'three-cubes' for the Extensions icon (current ST UI) - API key step notes the UI-overrides-config precedence explicitly - 'Honcho workspace ID' -> 'default Honcho workspace ID (configurable)' - Add Note after Context-modes table — Context only is session-scoped and returns empty until enough messages accumulate; Reasoning is the better default for fresh peers - Next Steps gains two cards: Install SillyTavern (upstream docs) and the Claude Code setup skill (skills/setup/SKILL.md) Follow-ups tracked separately — tool rename (observation -> conclusion, matching the /conclusion endpoint), architecture Excalidraw.
…ig (plastic-labs#587) * Update deriver.py * Simplify model configuration in deriver.py Removed stop_sequences from model configuration.
- Add Prerequisites section with SillyTavern install link + Node >= 18 requirement (was buried in Next Steps; users hit install step with no awareness ST needed to exist first). - Expand restart step into a callout: restart required for server-plugin reload, not for client-side edits. - Configure step now documents the three editable inputs (API key, Workspace ID, Your peer name) and where each saves. - Fix 'three-cubes icon' -> 'puzzle piece icon'. - Installer step list fleshed out: 6 steps (was 4), including config.yaml bootstrap and enableServerPlugins flip. Dropped the false claim that the plugin seeds a minimal ~/.honcho/config.json on first run. - Global Config section rewritten: resolution order now generalized to apiKey / workspace / peerName (was apiKey-only); documents panel write-back to hosts.sillytavern.*; dropped aiPeer references (it's a telemetry-only field, not user-facing). - Add a Disable / Enable global config subsection covering the opt-out toggle and the Inherit / Push local / Cancel diff dialog. - Troubleshooting: two new rows (stale peer name on new chat, cancelled diff dialog).
The plugin also writes to a root-level `sessions` map (ST dir → last Honcho session ID), not only to `hosts.sillytavern.*`. The earlier phrasing overstated the isolation claim.
honcho_save_observation is not registered in the extension — only honcho_query_memory and honcho_search_history exist in code.
* docs: adding opencode * docs: align opencode guide with latest plugin changes * chore: updating language --------- Co-authored-by: adavyas <adavyasharma@gmail.com>
- New Group Chats subsection: documents per-character peer routing (each group member gets their own peer, not a collapsed group-<id> peer) and lazy peer registration for characters joining mid-chat. - Session Naming: documents the freeze-on-first-assign invariant (changing the naming mode doesn't reroute existing chats) and the Reset button for explicit session rollover. - Tool table: add honcho_save_conclusion — prior fix undercounted (2 -> 3 tools). The extension registers all three.
…stic-labs#581) The Surprisal module passes `{"level": levels}` directly to `get_all_documents()`, but `apply_filter()` expects operator syntax: `{"level": {"in": levels}}`. Without the `in` operator, the filter is silently ignored, causing `_fetch_level_observations()` to return 0 results. This makes the entire Surprisal phase of the Dream cycle a no-op. Fixes plastic-labs#559
…ats last, drop event flow
* docs: adding opencode * docs: align opencode guide with latest plugin changes * chore: updating language * docs: remove interview command from opencode guide --------- Co-authored-by: ajspig <dragon@monstercode.com>
…, surface other knobs
docs: add SillyTavern to integrations
* docs: update opencode install command * docs: use native opencode plugin install
…s#615) * fix(deriver): ignore blank observations before embedding * Address PR review on observation normalization * Harden mock await arg access in tests * Unify blank observation filtering across tool paths * Move soft-delete query test back to fixture class
* fix(dreamer): threshold and time-guard semantics
Finding 2: filter count_stmt on documents.level == 'explicit' in
check_and_schedule_dream. Dreamer-created levels (deductive, inductive,
contradiction) are consolidation output, not input, and would otherwise
inflate the threshold count and create a feedback loop.
Finding 3 (code-level): relocate last_dream_at write from enqueue_dream
(enqueue.py) to process_dream (orchestrator.py), inside the
'if result is not None' block. Duplicate enqueues can no longer reset
the 8-hour time guard clock. Failed/never-run dreams don't advance it.
Success criteria: lenient (any non-null DreamResult counts). Pending
Vineeth confirmation — will adjust to strict/middle if requested.
Tests pending in follow-up commits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(dreamer): threshold filter + last_dream_at relocation regression tests
Tests for Finding 2 and Finding 3 (code-level):
- TestThresholdFilter (tests/dreamer/test_dream_scheduler.py):
* Mixed levels below explicit threshold: 30 explicit + 40 deductive
+ 10 inductive → no trigger (core regression, buggy count would trigger)
* Explicit-only at threshold: 60 explicit → triggers
* Contradiction excluded: 100 contradiction + 10 explicit → no trigger
(confirms positive == "explicit" filter excludes all dreamer output)
- TestEnqueueDreamMetadataShape (tests/deriver/test_enqueue_dream.py):
* AsyncMock-patched update_collection_internal_metadata verifies
enqueue writes last_dream_document_count but NOT last_dream_at
- TestLastDreamAtCompletionWrite (tests/dreamer/test_dreamer_integration.py):
* Happy path: run_dream returns DreamResult → last_dream_at written
* Failure path: run_dream returns None → last_dream_at absent
* Exception path: run_dream raises → last_dream_at absent,
process_dream swallows exception (queue-processed semantics preserved)
Docstring on check_and_schedule_dream tightened: "document threshold"
-> "explicit-observation threshold" to reflect filter semantics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(dreamer): preserve last_dream_document_count in completion write
CodeRabbit caught this: update_collection_internal_metadata uses a
top-level JSONB `||` merge, so passing {"dream": {"last_dream_at": ...}}
replaces the entire "dream" subkey and drops last_dream_document_count
that was written by enqueue_dream.
Symptom: after every completed dream, the baseline drops to 0. Next
check_and_schedule_dream reads documents_since_last_dream as
current_count - 0 = current_count, so any collection with >= 50
explicit observations can re-trigger immediately once the 8h guard
expires, even with no new raw material.
Fix: read-modify-write. Fetch current collection, merge last_dream_at
into the existing "dream" dict, write the merged dict back. Preserves
sibling keys (current: last_dream_document_count; future-proof for
telemetry fields that might land in PR 4).
Regression test added to tests/dreamer/test_dreamer_integration.py:
pre-seeds {"dream": {"last_dream_document_count": 42}}, runs
process_dream, asserts both last_dream_at is written AND
last_dream_document_count == 42 is preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(dreamer): address CodeRabbit feedback on b89997c
- enqueue.py: read-modify-write preserves last_dream_at when writing baseline
- dream_scheduler.py: explicit-level filter on execute_dream count query
- test fixture: pin DOCUMENT_THRESHOLD and ENABLED_TYPES for stability
- integration test: timezone-aware assertion on last_dream_at
Regression test added for enqueue sibling-drop (symmetric to c8fe40a).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(dreamer): session lookup symmetry + row lock on dream metadata RMW
- dream_scheduler.py: explicit-level filter on execute_dream session lookup
(baseline and session pick must agree on the same document set)
- crud.collection.get_collection: optional with_for_update flag for callers
that need serialized read-modify-write on internal_metadata
- enqueue.py + orchestrator.py: pass with_for_update=True on the RMW reads
to close the TOCTOU between concurrent enqueue and completion writes
Follow-up filed for jsonb_set-based nested updates (docs/factory/backlog/).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(dreamer): explicit-only count on manual schedule_dream route
The third caller of enqueue_dream — POST /workspaces/{id}/schedule_dream —
was passing an all-levels document count as the baseline, breaking symmetry
with check_and_schedule_dream and execute_dream after Loop 2's filter fixes.
Filter the manual route's count to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(dreamer): document explicit-only invariant on enqueue_dream.document_count
Loop 3 follow-up on d76627a. The parameter's semantic tightened across Loop
2 (check_and_schedule_dream, execute_dream) and Loop 3 (schedule_dream route)
to "explicit-level count, used as the baseline," but the signature still read
"Current document count for metadata update." The next caller would have no
way to know from the function contract.
Docstring now spells out: (1) the value is explicit-only, (2) it's written
as last_dream_document_count, (3) it's the baseline that
check_and_schedule_dream subtracts from to compute
documents_since_last_dream, (4) passing a count that includes non-explicit
levels (deductive, inductive, contradiction) inflates the baseline and
suppresses the next scheduled dream.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(dreamer): rename current_document_count → current_explicit_count
Loop 3 follow-up on d4e10e3. After Loop 2's filter landed, the local in
check_and_schedule_dream held an explicit-only count but was still named
current_document_count — asymmetric with execute_dream's current_explicit_count
(line 201) and contradicting the filter on line 269 that produces the value.
Pure rename: three occurrences (definition at 271, subtraction at 274, log
extra key at 282). No test references. Naming-as-invariant alignment with
d76627a (query filters), d4e10e3 (parameter docstring), and Loop 1's local
rename in execute_dream.
The persisted JSONB key last_dream_document_count is the one remaining
drift-layer; filed as plastic-claudebook backlog item for a separate PR
with an intentional migration path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(dreamer): atomic guard-pair write + in-flight stampede defense
Loop 4 response to Vineeth's CHANGES_REQUESTED on PR plastic-labs#573.
The pre-Loop-4 enqueue-time write of last_dream_document_count was serving
double duty: rate limiter AND stampede latch. By arming the 8h guard the
moment a dream entered the pipeline, it implicitly blocked a second dream
from being scheduled during the in-flight window. Loop 3 relocated the
last_dream_at write to completion without moving its sibling baseline,
splitting the semantic pair and exposing the latch role that had lived
only in Vineeth's head.
Invariant (now pinned to check_and_schedule_dream's docstring): from the
moment a dream is scheduled until it completes or fails, no second dream
may be enqueued for the same (workspace, observer, observed) — and the
baseline count advances only when consolidation actually happened.
Changes:
- enqueue_dream: remove the last_dream_document_count write entirely and
drop the document_count parameter. enqueue no longer touches dream
metadata; the implicit stampede latch is replaced by an explicit
queue-backed defense.
- process_dream: extend the existing row-locked RMW to write both guard
fields atomically. Current explicit-doc count is recomputed inside the
locked block (not carried on DreamPayload) so the pair reflects the
actual consolidation moment.
- check_and_schedule_dream: query QueueItem for pending dreams on this
collection's work_unit_keys (mirrors uq_queue_dream_pending_work_unit_key)
before arming a timer. Uses queue state as source of truth rather than
reflecting it into metadata.
- Tests: two new coherence tests under TestGuardPairCoherence —
test_pending_queue_item_blocks_second_schedule walks the stampede timeline,
test_silent_failure_allows_retry_on_same_corpus verifies failed dreams
don't consume the baseline. Existing tests updated to the new contract.
* chore(dreamer): trim comment slop from loop-4 atomic pair work
Compress three verbose comments added in d24958d — the invariant itself
is captured in check_and_schedule_dream's docstring, so the inline
narrative restates what the code already says.
- dream_scheduler.py defense C block: 5 lines → 2
- orchestrator.py atomic pair write: 4 lines → 1
- enqueue.py docstring paragraph: 5 lines → 2
Net: +5/-14. Follows Eri's eef27be precedent on sillytavern-honcho PR plastic-labs#7.
---------
Co-authored-by: lilyplasticlabs <lily@plasticlabs.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adopts upstream's full LLM client refactor (PR plastic-labs#459: src/utils/clients.py deleted in favor of the new src/llm/ package with per-backend handlers, ConfiguredModelSettings, and ModelTransport). Conflict resolutions were taken upstream-side via -X theirs and our customizations are re-applied adjacent to the new structure rather than as parallel forks. Notable upstream changes pulled in: - LLM client refactor: src/llm/{api,backend,executor,runtime,tool_loop,...} with src/llm/backends/{anthropic,gemini,openai}.py - ConfiguredModelSettings + ModelConfig replace per-component model fields - New honcho-cli package, Zo Computer / Paperclip / SillyTavern / opencode integration docs - Surprisal filter format fix (plastic-labs#581) — converged with our 4e7f136 - Many smaller fixes: dreamer thresholds, deriver blank-observation guard, vector sync retry budget, embed() string-input fix, etc. Adjacent re-applications (deployment-critical): - src/config.py: re-add LLM.CF_GATEWAY_AUTH_TOKEN - src/llm/registry.py: inject cf-aig-authorization header in get_*_override_client factories when base_url targets a CF gateway - src/embedding_client.py: same header injection on openai/gemini branches Dropped (now redundant or replaceable): - Per-specialist DEDUCTION_PROVIDER / INDUCTION_PROVIDER / *_THINKING_BUDGET_TOKENS overrides — covered by upstream's DREAM_DEDUCTION_MODEL_CONFIG__TRANSPORT etc. env vars - get_provider() / get_thinking_budget() methods on BaseSpecialist — superseded by ConfiguredModelSettings on each specialist's MODEL_CONFIG - src/utils/types.SupportedProviders — replaced by ModelTransport - Custom Traefik service block in docker-compose.yml.example — Traefik configs remain in docker/traefik/ for users who want to wire it up - Our 4e7f136 surprisal fix — identical to upstream's plastic-labs#581 Deployment notes for re-keying .env: - LLM_CF_GATEWAY_API_KEY / LLM_CF_GATEWAY_BASE_URL / LLM_OPENAI_BASE_URL / LLM_OPENAI_COMPATIBLE_* / LLM_VLLM_* are no-ops now (extra='ignore'). Use per-component MODEL_CONFIG__BASE_URL / MODEL_CONFIG__API_KEY env vars (e.g. DREAM_DEDUCTION_MODEL_CONFIG__BASE_URL=...). - LLM_CF_GATEWAY_AUTH_TOKEN remains as the single global needed for the cf-aig-authorization header. Verification: ruff check src/ ✓, basedpyright src/ ✓ (0 errors).
…am merge These edits should have been folded into the merge commit (a901f34) but were left uncommitted — pushing now to actually deliver CF Gateway support and clean up leftovers from the -X theirs auto-resolution. src/config.py - Add LLMSettings.CF_GATEWAY_AUTH_TOKEN (single global needed for the cf-aig-authorization header on any provider override client whose base_url targets a CF gateway URL). src/llm/registry.py - Inject cf-aig-authorization header in get_openai_override_client, get_anthropic_override_client, and get_gemini_override_client when base_url contains 'gateway.ai.cloudflare.com' AND LLM.CF_GATEWAY_AUTH_TOKEN is set. Rides on the existing openai/ anthropic/gemini transports — no parallel CF backend. src/embedding_client.py - Mirror the same header injection on the openai/gemini branches so embeddings through CF Gateway authenticate correctly. Helper is duplicated locally so the embedding client doesn't depend on the LLM runtime registry module. src/dreamer/specialists.py - Drop get_provider() / get_thinking_budget() override methods on BaseSpecialist + the per-specialist references to settings.DREAM. DEDUCTION_PROVIDER / INDUCTION_PROVIDER / *_THINKING_BUDGET_TOKENS. Those settings fields no longer exist upstream — same functionality is reachable via DREAM_DEDUCTION_MODEL_CONFIG__TRANSPORT etc. - Drop the orphan thinking_budget_tokens=llm_settings.THINKING_BUDGET_TOKENS arg on the honcho_llm_call site that survived the auto-merge — the value now lives on model_config which is already passed. src/main.py - ruff isort fix (autofixed) — uuid/time import order. Verification: ruff check src/ ✓, basedpyright src/ ✓ (0 errors).
Telemetry-only signal: True when the loop exited via the max-iterations synthesis path rather than the model deciding to stop. Distinguishes "model didn't converge" from natural termination so downstream observability can label the two cases differently. No emitter changes — flag is set but no consumer reads it yet.
Adds six new metrics + recorder methods on the existing PrometheusMetrics singleton; no callers yet, so this commit is purely declarative. Series: - llm_calls / llm_call_duration_seconds — counter + histogram per call, labeled by feature × provider × model × outcome. - llm_tokens — input/output/cache_read/cache_creation per feature × provider × model. - llm_tool_calls — per-tool invocation outcome inside the tool loop. - llm_iterations — histogram of iterations consumed per call/outcome. - llm_backup_used — counts failovers from primary to backup provider. Cardinality-bounded: feature × provider × model × outcome ≈ 1.7k series cap. Deliberately no workspace_name label here — these answer "is this model effective for this feature", not "is workspace X slow". LLMCallOutcome enum exported from src.telemetry.prometheus so callers can reference the canonical values without importing from the metrics module directly.
Introduces src/telemetry/llm_call_metrics.py — a context-manager-based wrapper that turns one LLM call into one set of Prometheus samples and one logfmt log line. Surface: - observe_llm_call(...) — context manager yielding a mutable _CallState the caller populates over the call's lifetime. - finalize_success(...) — populate state from a successful response and pick the outcome bucket (success / success_after_retry / success_via_backup). - mark_max_iterations(...) — flip the state to error_max_iterations when the tool loop exited via the synthesis path. - normalize_feature_label(...) — maps caller's track_name/trace_name to a low-cardinality Prom label (e.g. "Dreamer/deduction" -> dream_deduction). No callers wired in yet — this commit is the helper module on its own so the diff stays reviewable. Wiring into honcho_llm_call and the tool loop lands in subsequent commits. Errors raised inside the wrapped call are classified into outcome buckets (timeout / validation / other) and re-raised; the wrapper never swallows or transforms exceptions.
Adds prometheus_metrics.record_llm_tool_call() calls in both the success and error branches of execute_tool_loop's per-tool dispatch. Threads track_name / trace_name through the function signature so the emitted metric carries the same feature label that the call-level metrics will use. Both new params default to None (current callers don't pass them yet), so feature label resolves to "unknown" until honcho_llm_call is wired in the next commit. Metric emission is wrapped in PrometheusMetrics' sentry-captured error handler — a metric bug can never break a real tool call.
Wraps the body of honcho_llm_call (both tool-less and tool-loop paths) in observe_llm_call(...) so every invocation produces one set of Prometheus samples and one logfmt log line. Captures the AttemptPlan that produced the most-recent (and on success, the winning) call via a `last_plan` cell updated inside _get_attempt_plan, so the recorded provider/model is the one that actually answered — primary on early attempts, backup on the final retry. This makes backup-on-final-attempt observable directly from llm_calls / llm_tokens without parsing logs. Passes track_name and trace_name through to execute_tool_loop so its per-tool counter (added in the previous commit) carries the same feature label as the call-level metrics. When the tool loop returns response.hit_max_iterations=True, the call's outcome is overridden to error_max_iterations via mark_max_iterations so dashboards can split "model didn't converge" from clean success without the tool-loop having to know about outcome semantics. Streaming responses don't carry token counts at the entry point — the recorded call still emits but token counters skip those rows (record_llm_tokens silently no-ops on count<=0). Acceptable partial signal until streaming refactor surfaces tokens earlier. ruff + basedpyright clean. End-to-end smoke verified all six series fire correctly across success, success_via_backup, error_max_iterations, error_timeout, and tool-call paths.
6 tasks
…ity-metrics feat(telemetry): per-LLM-call metrics, structured logs, and tool tracking
4 tasks
2 tasks
offendingcommit
added a commit
that referenced
this pull request
May 4, 2026
…architecture The April-16 gotchas section was stale after the upstream sync (PR #4): - Legacy 'cf' / 'custom' provider tags removed — replaced by ModelTransport literal (anthropic/openai/gemini) and per-component MODEL_CONFIG__* env vars - 'deriver/summary must stay on cf' rule no longer applies — native gemini backend (src/llm/backends/gemini.py) honors response_format=json_schema - thoughtSignature multi-iteration workaround obsolete — preserved natively in src/llm/history_adapters.py + src/llm/executor.py - LM Studio section: env var names switched from LLM_OPENAI_COMPATIBLE_* globals to MODEL_CONFIG__OVERRIDES__BASE_URL / __API_KEY per component Adds a note that CF Gateway integration is now app-level (cf-aig-authorization auto-injected by src/llm/registry.py and src/embedding_client.py based on base_url pattern matching) rather than deployment-level URL routing. Ollama Cloud structured-output limitation kept — that's still a real upstream constraint, just rephrased for the new transport model.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Syncs our
mainwithplastic-labs/honchoupstream main (42 commits behind). Adopts upstream's full LLM client refactor (PR plastic-labs#459) wholesale and re-applies our deployment-critical CF Gateway support adjacent to the new architecture so future syncs stay near-mechanical.What's pulled from upstream
src/utils/clients.pydeleted; replaced bysrc/llm/package with per-backend handlers (backends/anthropic.py,backends/gemini.py,backends/openai.py),ConfiguredModelSettings,ModelTransportliteral, and sharedrequest_builder/executor/tool_loopmodules.honcho-cli/package, feat: adding honcho-cli package plastic-labs/honcho#424).4e7f136), dreamer threshold/time-guard (fix(dreamer): threshold and time-guard semantics plastic-labs/honcho#573), deriver blank-observation guard (fix(deriver): ignore blank observations before embedding plastic-labs/honcho#615), embed() string-input fix (fix: embed() sends string input instead of array, breaking OpenAI-compatible providers plastic-labs/honcho#586), turbopuffer error handling (handle turbopuffer server errors plastic-labs/honcho#561), vector sync retry budget (fix: make vector sync wait 10min between retries plastic-labs/honcho#604), and others.What we re-applied (deployment-critical, adjacent to upstream)
LLMSettings.CF_GATEWAY_AUTH_TOKEN(single global needed for cf-aig-authorization header).cf-aig-authorizationheader inget_openai_override_client/get_anthropic_override_client/get_gemini_override_clientwhenbase_urltargetsgateway.ai.cloudflare.com. Routes all CF-gateway-bound clients through the auth header — no parallel backend.The
_cf_gateway_headers()helper is duplicated acrosssrc/llm/registry.pyandsrc/embedding_client.pyso the embedding client doesn't depend on the LLM runtime registry.What we dropped (now redundant or replaceable)
DEDUCTION_PROVIDER/INDUCTION_PROVIDER/*_THINKING_BUDGET_TOKENSconfig fields +get_provider()/get_thinking_budget()methods on BaseSpecialist) — fully replaceable via upstream'sDREAM_DEDUCTION_MODEL_CONFIG__TRANSPORT/__THINKING_BUDGET_TOKENSenv vars onConfiguredModelSettings.src/utils/types.SupportedProviders— replaced by upstream'sModelTransportliteral insrc/config.py.docker-compose.yml.example— example file now matches upstream defaults; configs indocker/traefik/remain for users who want to wire it up.4e7f136surprisal filter fix — byte-identical to upstream's fix(surprisal): use correct filter format for level observations plastic-labs/honcho#581, naturally converged.Deployment migration notes — re-key the .env before deploying
Old vars are silently no-ops because of
extra='ignore'onLLMSettings. Update the deployment .env to use upstream's per-component pattern:LLM_OPENAI_BASE_URL=...<COMPONENT>_MODEL_CONFIG__BASE_URL=...(e.g.DIALECTIC_MODEL_CONFIG__BASE_URL)LLM_CF_GATEWAY_API_KEY=...<COMPONENT>_MODEL_CONFIG__API_KEY=...LLM_CF_GATEWAY_BASE_URL=...<COMPONENT>_MODEL_CONFIG__BASE_URL=...(CF gateway URL)LLM_OPENAI_COMPATIBLE_*,LLM_VLLM_*,LLM_GROQ_API_KEY<COMPONENT>_MODEL_CONFIG__*LLM_CF_GATEWAY_AUTH_TOKEN=...DREAM_DEDUCTION_PROVIDER=anthropicDREAM_DEDUCTION_MODEL_CONFIG__TRANSPORT=anthropicDREAM_DEDUCTION_THINKING_BUDGET_TOKENS=2048DREAM_DEDUCTION_MODEL_CONFIG__THINKING_BUDGET_TOKENS=2048Local
docker-compose.yml(untracked) is unaffected by the example file change.Test plan
uv run ruff check src/— passesuv run basedpyright src/— 0 errors, 2 pre-existing warnings unrelated to this mergefrom src.llm import registry; from src.dreamer import specialists; ...imports OK.env, hit/v1/peers/{id}/chatto verify CF Gateway path still works end-to-end