chore: sync upstream plastic-labs/honcho main#2
Closed
offendingcommit wants to merge 26 commits intomainfrom
Closed
chore: sync upstream plastic-labs/honcho main#2offendingcommit wants to merge 26 commits intomainfrom
offendingcommit wants to merge 26 commits intomainfrom
Conversation
…#495) * chore: add .worktrees/ to .gitignore * feat(examples): add Zo Computer memory skill integration * feat(examples): add Zo Computer memory skill integration * fix(examples): address CodeRabbit review on Zo skill integration - Fix version inconsistency: SKILL.md matches pyproject.toml (>=2.1.0) - Move client.py into tools/ package and use relative imports - Add assistant_id parameter to save_memory() for consistency with get_context() - Use UUID-based IDs in tests to prevent state leakage between runs - Add pytest.mark.skipif guard on integration tests (requires HONCHO_API_KEY) - Fix import ordering, move pytest to module level, sort __all__ alphabetically - Fix markdown blank lines around fenced code blocks (MD031) - Add rate limit delay fixture to avoid hitting Honcho free tier limits * fix(examples): validate HONCHO_API_KEY early in client initialization * docs(examples): note cross-peer memory behavior in shared workspaces * docs(examples): fix save_memory and query_memory signatures in README * docs(examples): fix markdown linting issues in README * docs(examples): add assistant_id parameter to save_memory example in SKILL.md --------- Co-authored-by: Luba Kaper <lubakaper@lubas-air.mynetworksettings.com>
…fig guide (plastic-labs#510) * fix: Inconsistencies in Docs, health endpoint, troubleshooting guide * fix: (docs) maintain consistency on postgres db name * chore: (docs) update v2 contributing docs with updates db paths * docs: overhaul self-hosting docs for provider-agnostic setup - .env.template: lead with provider options (custom, vllm, google, anthropic, openai, groq) instead of baking in vendor-specific keys. All provider/model settings commented out so server fails fast until configured. Separate endpoint config from per-feature provider+model from tuning knobs. - docker-compose.yml.example: fix healthcheck -d honcho -> -d postgres to match POSTGRES_DB=postgres. - config.toml.example: reorder and document LLM key section with OpenRouter and vLLM examples. - self-hosting.mdx: replace multi-vendor key table with provider options table. Add examples for OpenRouter, vLLM/Ollama, and direct vendor keys. Remove duplicated key lists from Docker/manual setup sections. - configuration.mdx: replace scattered provider docs with provider types table. Fix Docker Compose snippet to match actual compose file. Note code defaults as fallback, not recommended path. - troubleshooting.mdx: add alternative provider issues section (custom provider config, model name format, Docker localhost, structured output failures). * docs: add Docker build troubleshooting for permission errors - Document BuildKit requirement (RUN --mount syntax) - AppArmor/SELinux blocking Docker builds on Linux - Volume mount UID mismatch between host and container app user - Note in self-hosting docs that Docker path builds from source * docs: reframe self-hosting as contributor/dev path, point to cloud service * Revert "docs: reframe self-hosting as contributor/dev path, point to cloud service" This reverts commit 3e766eb. * docs: add production compose, model guidance, thinking budget docs - Add docker-compose.prod.yml for VM/server deployment: no source mounts, restart policies, 127.0.0.1-bound ports, cache enabled - Add model tier guidance and community quick-start link to self-hosting - Document THINKING_BUDGET_TOKENS gotcha for non-Anthropic providers - Add reverse proxy examples (Caddy + nginx) to production section - Add backup/restore commands to production considerations * docs: simplify self-hosting to single provider, restructure config guide Self-hosting page now defaults to one OpenAI-compatible endpoint with one model for all features. Moved model tiers, alternative providers, and per-feature tuning into the configuration guide. Eliminated duplicate config priority sections, dev/prod split, and redundant TOML examples. * docs: merge compose files, restore provider/model to feature sections in .env.template Single docker-compose.yml.example with dev sections commented out. Moved PROVIDER and MODEL back alongside each feature in .env.template so settings stay colocated with their module. Updated self-hosting docs to reference single compose file. * fix: broken anchor links, redundant migration step, minor inconsistencies Fix 4 broken internal links (#llm-provider-setup, #llm-api-keys, #which-api-keys-do-i-need, #alternative-providers) to point to correct headings. Remove redundant Docker migration step (entrypoint already runs alembic). Fix cache URL missing ?suppress=true in reference config. Fix uv install command to use official method. * docs: env template ready to use, simplify self-hosting flow .env.template now has provider/model lines uncommented with placeholder values — user just sets endpoint, key, and model name. Thinking budgets default to 0 for non-Anthropic providers. Self-hosting page: removed 30-line env var wall, LLM setup now points to the template. Merged duplicate verify sections. Removed api_key from SDK examples (auth off by default). * docs: reorder next steps, configuration guide first * fix: default embedding provider to openrouter for single-endpoint setup Without this, embeddings default to openai which requires a separate LLM_OPENAI_API_KEY. Setting to openrouter routes embeddings through the same OpenAI-compatible endpoint as everything else. * fix: review issues — hermes page, thinking budget, production wording Hermes integration page: replaced inline Docker/manual setup with link to self-hosting guide, added elkimek community link. Removed old env var names (OPENAI_API_KEY without LLM_ prefix). Troubleshooting: removed "or 1" from thinking budget guidance. Self-hosting: softened "production-ready" to "production-oriented" since auth is disabled by default. * docs: model examples in template, expanded LLM setup, better verify flow .env.template: added "e.g. google/gemini-2.5-flash" hints next to model placeholders so users know the expected format. Self-hosting: expanded LLM Setup to show the 3 things users need to set (endpoint, key, model name) with find-replace tip. Added build time note, deriver log check, and real smoke test (create workspace) to verify section. Health check now notes it doesn't verify DB/LLM. * fix: smoke test uses v3 API path, not v1 * docs: clarify deriver metrics port vs Prometheus host port * fix: remove deprecated memoryMode from hermes config example * docs: update hermes page to match current memory provider config Updated config to match hermes-agent docs: removed apiKey (not needed for self-hosted), added hermes memory setup CLI command, added config fields table (recallMode, writeFrequency, sessionStrategy, etc.). Better verification tests: store-and-recall across sessions, direct tool calling test. Links to upstream hermes docs for full field list. * fix: invalid THINKING_BUDGET_TOKENS=0 and missing docker/ in image Comment out THINKING_BUDGET_TOKENS=0 in .env.template — deriver, summary, and dream validators require gt=0. Dialectic levels also commented out since non-thinking models don't need the override. Add COPY for docker/ directory in Dockerfile so entrypoint.sh is available when docker-compose.yml.example references it. * chore: Additional troubleshooting step --------- Co-authored-by: Vineeth Voruganti <13438633+VVoruganti@users.noreply.github.com>
* fix: further remove extraneous transactions * fix: (search) use 2 phase function to reduce un-needed transaction * fix: refactor agent search to perform external operations before making a transaction * fix: reduce scope of queue manager transaction * fix: (bench) add concurrency to test bench * fix: address review findings for search dedup, webhook idempotency, and bench throttling * Fix Leakage in non-session-scoped chat call (plastic-labs#526) * fix: (search) reduce scope for peer based searches * fix: tests * fix: (test) address coderabbit comment * fix: drop db param from deliver_webhook --------- Co-authored-by: Rajat Ahuja <rahuja445@gmail.com>
* chore: (docs) Update changelogs and version numbers * chore: remove extraneous dep on mintlify
* Simplify Paperclip integration instructions Clarified instructions for local Honcho setup and removed unnecessary details. * Update docs.json * Update links in Paperclip integration guide * Revise memory initialization instructions in Paperclip guide Updated instructions for initializing memory and removed optional checks section.
…ic-labs#530) The HEALTHCHECK directive probes an HTTP endpoint that only the API serves. The deriver service reuses this image but is a background queue worker with no HTTP server — the probe can never succeed, so Docker permanently marks the deriver container as unhealthy. Remove the HEALTHCHECK from the shared image. Service-level health checks belong in each service's own configuration (e.g. Kubernetes readiness/liveness probes on the API Deployment only). Closes plastic-labs#521 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ation (plastic-labs#459) * fix: Add JSON repair for truncated LLM responses across all providers and Gemini thinking budget support LengthFinishReasonError from OpenAI-compatible providers (custom, openai, groq) was crashing the deriver with 14k+ occurrences in production. The vLLM path already had repair logic but it was gated on provider=="vllm", unreachable when routing through litellm as a custom provider. - Extract shared _repair_response_model_json() helper for all providers - Catch LengthFinishReasonError in OpenAI/custom parse() path and repair truncated JSON - Add repair fallback to Anthropic and Gemini response_model paths - Add repair fallback to Groq response_model path - Pass thinking_budget_tokens to Gemini 2.5 models via thinking_config - Add 14 tests covering repair paths for all providers and Gemini thinking budget Fixes HONCHO-YC Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: live llm integration tests * feat: Consistent Model Config Protocol * fix: migrate the remaining app callers off the legacy llm_settings path * fix: Docs and regression tests * fix: refactor llm runtime path to model-config-only API * fix: refactor config to nested model-config source of truth * fix: refactor llm streaming and tool dispatch through backends * fix: cut over llm config to nested model_config only * fix: collapse vllm and custom into openai_compatible transport * feat: refactor llm config to explicit transports and bare model ids * feat: (embed) Add configurability for embedding model * fix: tests for embedding provider * fix: Address Review Comments * fix: (llm) remove Groq backend and per-vendor base URLs * chore: move llm tests * fix: (llm) address review findings — config regressions, backend bugs, dead code * fix: address backend end silly errors * chore: (docs) update configuration and self-hosting guides * chore: fix tests * fix: address code rabbit comments * fix: add validation to the dream settings * fix: further address code rabbit comments * fix: Address Code Rabbit Comments * fix: Another round of code rabbit * fix: Address Code Rabbit Nits * fix: tests * refactor: rename thinking validator to reflect transport scope _validate_anthropic_thinking_minimum only enforces the >=1024 rule for Anthropic and no-ops for other transports, so the name was misleading now that it's shared across ConfiguredModelSettings, FallbackModelSettings, and ModelConfig. Renamed to _validate_thinking_constraints with a docstring clarifying per-transport behavior. No logic change. * fix(config): drop transport-specific thinking params when env override changes transport _fill_defaults_for_nested_field previously preserved the default MODEL_CONFIG's thinking_budget_tokens/thinking_effort across a transport override. This leaked Gemini-family defaults (e.g. thinking_budget_tokens=1024) into OpenAI-transport overrides, and the OpenAI backend then correctly rejected the unsupported param at call time (OpenAI uses reasoning.effort, not a token budget). The helper now strips thinking_budget_tokens and thinking_effort from the default dict when the env override supplies a transport different from the default's. Explicit thinking params in the override are preserved. * fix(config): apply thinking-param strip to dialectic level merge too DialecticSettings._merge_level_defaults does its own inline MODEL_CONFIG merge (parallel to _fill_defaults_for_nested_field), so the previous fix missed dialectic-level overrides. E.g. flipping DIALECTIC_LEVELS__minimal__MODEL_CONFIG__TRANSPORT from gemini (default) to openai still leaked the default thinking_budget_tokens=0 into the openai config, which the OpenAI backend then rejected at call time. The level-merge path now applies the same 'strip transport-specific thinking params when transport changes' rule as the generic helper. Added a regression test exercising the merge validator directly. * refactor(llm): wire ModelConfig knobs through, prune clients.py migration leftovers Three connected fixes to finish carving the LLM stack out of src/utils/clients.py and into src/llm/: 1. Propagate ModelConfig tuning knobs into backend calls. honcho_llm_call_inner built extra_params from only {json_mode, verbosity}, silently dropping top_p, top_k, frequency_penalty, presence_penalty, seed, and operator-supplied provider_params from any ModelConfig. Thread the selected config through ProviderSelection and merge build_config_extra_params(selected_config) into extra_params; per-call kwargs still win over provider_params defaults. Makes _build_config_extra_params public as build_config_extra_params so clients.py and request_builder.py share one translation. Adds TestModelConfigExtraParamsPropagation covering OpenAI/Anthropic knob propagation, provider_params passthrough, and per-call override precedence. 2. Drop dead extract_openai_* duplicates in clients.py. extract_openai_reasoning_content, extract_openai_reasoning_details, and extract_openai_cache_tokens had no callers outside their own definitions — the live implementations live in src/llm/backends/openai.py. -103 lines from clients.py. 3. Unify on ModelTransport, delete SupportedProviders. The "google" vs "gemini" split forced a _provider_for_model_config translation shim in two places. Replace all SupportedProviders usages with ModelTransport, rename CLIENTS["google"] → CLIENTS["gemini"], update provider branches + LLMError labels + reasoning-trace entries accordingly. Trace JSONL now writes "provider": "gemini" instead of "google" — consistent with the broader env-var rename cutover. Also tidies up pre-existing basedpyright findings in tests/llm/test_model_config.py (pydantic before-validator dict inputs + descriptor-proxy call). ruff: clean. basedpyright: 0 errors, 0 warnings. Tests: 153/153 pass across tests/utils/test_clients.py, tests/utils/test_length_finish_reason.py, tests/llm/, tests/dialectic/, tests/deriver/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(llm): finish the src/utils/clients.py → src/llm/ migration honcho_llm_call_inner now delegates to request_builder.execute_completion and execute_stream instead of re-implementing backend call scaffolding inline. The new _effective_config_for_call helper carries per-call kwargs (temperature, stop_seqs, thinking_budget_tokens, reasoning_effort) onto the selected ModelConfig — or synthesizes a minimal config for the test-only callers that pass provider+model directly. max_output_tokens is zeroed on the effective config to preserve the current "per-call max_tokens wins" semantic; honoring ModelConfig.max_output_tokens is a separable correctness concern. Side effect of routing through the new path: ConfiguredModelSettings' thinking_budget_tokens validator now fires on synthesized configs. test_anthropic_thinking_budget was asserting that a sub-1024 budget propagated to Anthropic — bumped to 1024 to match what Anthropic actually accepts. Unified client construction. Promoted the cached client factories in src/llm/__init__.py (get_anthropic_client, get_openai_client, get_gemini_client, get_{anthropic,openai,gemini}_override_client) to public API and added them to __all__. Promoted credentials._default_transport_api_key → default_transport_api_key. Deleted the duplicate _build_client and _default_credentials_for_provider from clients.py; _client_for_model_config now falls through to the public factories. CLIENTS dict and _get_backend_for_provider stay as the mockable seam for the ~50 patch.dict(CLIENTS, {...}) test call sites. Wired operator-configurable Gemini cached-content reuse end-to-end. PromptCachePolicy moved from src/llm/caching.py into src/config.py so ModelConfig can reference it as a field without a circular import; caching.py re-exports the name for existing imports. Added cache_policy: PromptCachePolicy | None on ConfiguredModelSettings, FallbackModelSettings, ResolvedFallbackConfig, and ModelConfig. resolve_model_config, _resolve_fallback_config, and _select_model_config_for_attempt copy the field through. honcho_llm_call_inner passes effective_config.cache_policy into execute_completion / execute_stream, so operators opt in via e.g. DERIVER_MODEL_CONFIG__CACHE_POLICY__MODE=gemini_cached_content and the selection actually fires instead of sitting on a dead path. New regression test test_cache_policy_reaches_gemini_backend asserts the PromptCachePolicy object reaches the Gemini backend's extra_params. ruff + basedpyright: clean. Tests: 154/154 pass across tests/utils/test_clients.py, tests/utils/test_length_finish_reason.py, tests/llm/, tests/dialectic/, tests/deriver/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(llm): move all LLM orchestration into src/llm/ and delete clients.py The 1624-line src/utils/clients.py has been carved up into focused modules under src/llm/ and deleted. There is now one golden path for LLM orchestration and no dual entrypoint. New module layout: src/llm/ __init__.py thin stable re-export surface api.py public honcho_llm_call with retry + fallback + tool loop delegation executor.py honcho_llm_call_inner (single-call executor); bridges to request_builder.execute_completion / execute_stream tool_loop.py execute_tool_loop + stream_final_response, plus assistant-tool-message and tool-result formatting runtime.py AttemptPlan dataclass (replaces the loose ProviderSelection NamedTuple), effective_config_for_call, plan_attempt, per-retry temperature bump, attempt ContextVar registry.py single owner of CLIENTS dict + cached default and override SDK-client factories + backend/history-adapter selection + high-level get_backend(config) conversation.py count_message_tokens, tool-aware message grouping, truncate_messages_to_fit types.py HonchoLLMCallResponse, HonchoLLMCallStreamChunk, StreamingResponseWithMetadata, IterationData, IterationCallback, ReasoningEffortType, VerbosityType, ProviderClient request_builder.py low-level request assembly (ModelConfig → backend complete/stream); no longer owns credential resolution credentials.py default_transport_api_key, resolve_credentials caching.py gemini_cache_store; re-exports PromptCachePolicy from src.config backend.py Protocol + normalized result types history_adapters.py provider-specific assistant/tool message shapes structured_output.py backends/ AnthropicBackend, OpenAIBackend, GeminiBackend handle_streaming_response had no production callers; it is deleted. The three tests that used it now drive honcho_llm_call_inner(stream=True, client_override=...) directly, which exercises the same code path the public API uses. Dead credential passthrough removed. The ProviderBackend Protocol and all three concrete backends no longer accept api_key / api_base — those are baked into the underlying SDK client at registry construction time and were being del'd everywhere they appeared. request_builder also stops resolving and forwarding them. Client construction is unified. The cached default-client factories (get_anthropic_client, get_openai_client, get_gemini_client) and override factories (get_*_override_client) are promoted to public API; the module-level CLIENTS dict populates from them and remains the patch.dict(CLIENTS, {...}) mocking seam tests rely on. Old duplicate helpers (_build_client, _default_credentials_for_provider) are gone. default_transport_api_key is promoted to public. Application imports now come from src.llm (dreamer, dialectic, deriver, summarizer, telemetry-adjacent tests). No code imports from src.utils.clients anywhere in the repo. ruff: clean. basedpyright: 0 errors, 0 warnings. Tests: 1013/1013 pass across the entire non-infra test suite (excluding tests/unified, tests/bench, tests/live_llm, tests/alembic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(llm): sanitize tool schemas for Gemini's function_declarations validator Gemini's native-transport function-declarations validator accepts a narrow subset of JSON-Schema / OpenAPI: type, format, description, nullable, enum, properties, required, items, minItems, maxItems, minimum, maximum, title. Anything else — additionalProperties, allOf, if/then/else, $ref, anyOf, oneOf, $defs, patternProperties — triggers an INVALID_ARGUMENT 400 at call time. Our agent tool schemas in src/utils/agent_tools.py use several of those (additionalProperties: false, allOf + if/then conditionals) because they were authored for OpenAI strict-mode + Anthropic, which need the richer vocabulary. GeminiBackend._convert_tools was passing them straight through. Add _sanitize_schema(): walks the parameters tree and drops unsupported keywords while preserving semantics for the keywords that hold user data (properties maps field-name → sub-schema; required / enum are lists of literals; items is a single sub-schema). Other backends are untouched and continue to receive the full strict schemas. Regression tests: - test_gemini_sanitize_schema_strips_unsupported_keywords: confirms additionalProperties, allOf + if/then, and $defs are stripped at nested levels while legitimate fields survive. - test_gemini_convert_tools_sanitizes_parameters_schema: end-to-end _convert_tools output has no forbidden keys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: fix tool calling syntax for gemini * refactor(llm): normalize defaults, widen OpenAI reasoning-model routing * chore: fix test * fix(llm): address post-migration review feedback * fix(llm): gemini robustness + dreamer specialist ergonomics * chore: addres review comments * chore: (docs) unrelease changelog addition * chore: (docs) merge commit changes --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Erosika <eri@plasticlabs.ai>
* feat: adding honcho-cli package * feat: adding more support for command-level flags, also including workarounds for getting raw SDK info * feat: adding peer config * feat: adding setup commands * chore: setting up package dependencies for cli * feat: promote init/doctor to top-level + polish wizard * feat: make init --yes fall back to existing config * chore: updating documentation * chore: updating tagline * feat: structurally updating recomended settings for CLI * fix: style * fix: removing redundant describe method * fix: delete key generation commands and fixing session ID * fix: removing defaults and changing config write path. * chore: pagnating conclusions * chore: require workspace * fix: polish command surfaces — scoping, validation, perf, consistency * chore: removing session message * fix: CLI output shape, destructive-confirm previews, skip needless round-trips * chore: CLI polish — peer inspect config, drop dead helper, doc/help consistency * chore: update readme * chore: updating tests * chore: doc updates * fix: config command * chore: unused code * fix: doctor command * fix: removing quiet tag and fixing session key ordering * fix: config commands and session id command * fix: removing message_count * fix: branding circular dependency * fix: refactor lazy imports to use common.py correctly. * fix: removing all lazy imports * chore: cr fixes * fix: config, env, flag setup * chore: updating skill * feat: adding workspace, session, and message create * fix: init now supports local honcho * chore: cr * feat(cli): CLI surface polish — reasoning flag, peer-scoped messages, help sync Add --reasoning/-r to peer chat (minimal..max), -p peer filter to message list with newest-first ordering, and a curated welcome panel with getting-started/memory/commands sections. Sync the welcome panel and group help strings with the actual registered commands — drop phantom 'session clone', add the 4 missing peer commands and 7 missing session commands, fix conclusion/message/ workspace group docstrings that claimed commands that don't exist. * feat(cli): themed, unified help system with pattern/example Replace the hand-rolled welcome with a layered system: - Theme typer.rich_utils (dim borders, brand color) so every --help inherits the voice. - HonchoTyperGroup subclass renders a curated 3-panel welcome (getting started / memory / commands) with recipes Typer can't auto-generate. - Unify the front door: bare 'honcho', 'honcho --help', and 'honcho help' all render the same welcome via one code path; sub-groups and leaf commands still get Typer's themed renderer. - Replace Click's 'Usage: …' line with pattern/example rows at every sub-group and leaf command, so the help voice stays consistent from top to leaves. * refactor(cli): address review — typed exceptions, chmod 600, tighter redaction, class-based help, tests - Replace module-level monkey-patch of TyperGroup/TyperCommand.get_usage with HonchoTyperGroup applied via cls= on every sub-Typer. Lives in a new _help.py module to avoid circular imports. No longer leaks behavior changes into other Typer users in the same process. - _test_connection dispatches on the SDK's typed exceptions (AuthenticationError, ConnectionError, TimeoutError, APIError) instead of substring-matching error messages. - Config.save() now chmods ~/.honcho/config.json to 0o600 after write so the plaintext API key isn't world-readable on multi-user hosts. - Tighten api_key redaction to '***<last4>' (was 'header...last4'), matching setup._redact for consistency. Short keys fully masked. - Add test_validation.py covering safe IDs, unsafe chars, path traversal, and empty input. Update test_config.py redaction cases and add 0o600 permission assertion. Fix stale patch paths in test_commands.py that pointed at honcho_cli.main instead of the command modules where get_client is actually imported. * feat(cli): add options panel to welcome menu Append a fourth panel listing the global flags (-w/-p/-s, --json, --version, --help) with their env-var counterparts. Discoverable from bare 'honcho' without needing to hunt for --help. * chore(cli): drop --version from welcome options panel * feat(cli): add pixel-honcho icon to banner Prepend a 13-char ASCII rendering of honcho-pixel.svg to the HONCHO wordmark. Uses Unicode half-blocks to pack 12 pixel rows into 6 text rows, faithfully preserving the SVG outline (two eye dots, mouth slit, tapering foot). Appears in bare 'honcho', 'honcho --help', 'honcho --version', and 'honcho init'. * fix: polish Honcho CLI wolcome panel and error messages * fix: honcho workspace inspect speed * chore: minor fix to session pagination * fix: removing NDJSON output * chore: consolidating honcho CLI's dula argv grammar onto Pattern A (command-first) * chore: clean up imports * fix: four `-s` consistency fixes applied * chore: minor changes to memory rows * fix: changing package name to honcho-cli * fix: removing pixel face --------- Co-authored-by: Erosika <eri@plasticlabs.ai>
…lastic-labs#575) The MCP Worker hardcoded https://api.honcho.dev for every request, forcing anyone running a self-hosted Honcho instance to patch the source before deploying their own Worker alongside it. Route the baseUrl through the Worker env so operators can set HONCHO_API_URL (via .dev.vars for local development or wrangler secret for deployed Workers) and point the Worker at their instance. The variable is intentionally not exposed as a request header: that would let public clients steer traffic to internal URLs, which is a latency and security regression. When HONCHO_API_URL is unset, the Worker falls back to https://api.honcho.dev, so existing deployments are unaffected. Closes plastic-labs#508
…patible providers (plastic-labs#586) * fix: wrap single embed() input in array for OpenAI-compatible provider compatibility * Fix input format in embedding test assertion
* fix: catch InternalServerError from turbopuffer * fix: remove unused VectorUpsertResult * fix: downgrade vector store sync errors to warnings * fix: remove upsert_with_retry * fix: (vector) add silent path and explicit path for vector db server errors --------- Co-authored-by: Vineeth Voruganti <13438633+VVoruganti@users.noreply.github.com>
* docs: adding cli doc * docs: adding generated script and content and github workflow * chore: removing workflow * fix: (docs) re-format and add details to cli-reference docs --------- Co-authored-by: Vineeth Voruganti <13438633+VVoruganti@users.noreply.github.com>
* fix: moving cli skills to root * chore: updating cli readme * chore: updating language * chore: updating docs
…ig (plastic-labs#587) * Update deriver.py * Simplify model configuration in deriver.py Removed stop_sequences from model configuration.
* docs: adding opencode * docs: align opencode guide with latest plugin changes * chore: updating language --------- Co-authored-by: adavyas <adavyasharma@gmail.com>
…stic-labs#581) The Surprisal module passes `{"level": levels}` directly to `get_all_documents()`, but `apply_filter()` expects operator syntax: `{"level": {"in": levels}}`. Without the `in` operator, the filter is silently ignored, causing `_fetch_level_observations()` to return 0 results. This makes the entire Surprisal phase of the Dream cycle a no-op. Fixes plastic-labs#559
* docs: adding opencode * docs: align opencode guide with latest plugin changes * chore: updating language * docs: remove interview command from opencode guide --------- Co-authored-by: ajspig <dragon@monstercode.com>
Resolve conflicts between fork-only commits (CF Gateway auth, Gemini thought_signature fix, LM Studio/Prometheus/Traefik stack, dreamer specialist overrides) and upstream's new src/llm/ transport-based abstraction that replaces src/utils/clients.py. Port decisions: - Dropped fork's cf / custom / vllm / groq providers — superseded by the new ModelConfig base_url/api_key override mechanism. - Kept OPENAI_BASE_URL and CF_GATEWAY_AUTH_TOKEN on LLMSettings and wired them into src/llm/registry (default + override OpenAI clients) and src/embedding_client so CF AI Gateway routing survives the refactor. - Ported thought_signature extraction into OpenAIBackend and replay into OpenAIHistoryAdapter so Gemini thinking models via the CF OpenAI-compat route can do multi-turn tool loops without 400ing. - Dropped fork's DEDUCTION_PROVIDER / INDUCTION_PROVIDER and matching THINKING_BUDGET_TOKENS fields — upstream's per-specialist DEDUCTION_MODEL_CONFIG / INDUCTION_MODEL_CONFIG (full ConfiguredModelSettings) is a strict superset. - Kept fork's traefik+prometheus+grafana docker-compose stack; kept upstream's broader docker/ COPY in the Dockerfile.
basedpyright with reportMissingTypeArgument rejected the bare `dict` types in the mock fake_post used by the SDK message-batching test, failing Static Analysis on PR #3. Add `dict[str, Any]` annotations and an explicit return type so CI stays green.
basedpyright's default exit code is non-zero whenever any diagnostics are reported, so the 8 warnings introduced by the fork-only commits were failing the Static Analysis job on PR #3 even though there were no errors. - src/deriver/queue_manager.py: drop `item.created_at is not None` guards. created_at is `Mapped[datetime.datetime]` (non-nullable), so the checks were always True and basedpyright flagged them as reportUnnecessaryComparison. - tests/sdk/test_session.py: factor out the shared mock-response body into a single helper and give the per-branch closures distinct names. This clears reportRedeclaration on `calls` / `fake_post` and lets the `# pyright: ignore` comments target the actual warning (reportPrivateUsage on `_http` / `_async_http_client`) instead of the irrelevant reportAttributeAccessIssue that was flagged as an unnecessary ignore.
Owner
Author
|
Closing in favor of #4, which captures everything this PR had plus the most recent 21 upstream commits AND adds the deployment-critical CF Gateway header injection adjacent to upstream's new src/llm/ architecture (without that, every CF-gateway-bound call would silently fail auth). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
plastic-labs/honchomain into this fork's mainsrc/llm/backend package (replacessrc/utils/clients.py), newhoncho-cli, docs updates, and all other upstream workNotes
src/utils/clients.pyand will need to be re-ported onto the newsrc/llm/abstraction.Test plan
src/llm/backendsuv run pytest tests/uv run ruff check src/anduv run basedpyright