Skip to content

chore(sync): merge upstream/main (42 commits)#4

Merged
offendingcommit merged 50 commits intomainfrom
sync/upstream-2026-05-03
May 4, 2026
Merged

chore(sync): merge upstream/main (42 commits)#4
offendingcommit merged 50 commits intomainfrom
sync/upstream-2026-05-03

Conversation

@offendingcommit
Copy link
Copy Markdown
Owner

Summary

Syncs our main with plastic-labs/honcho upstream main (42 commits behind). Adopts upstream's full LLM client refactor (PR plastic-labs#459) wholesale and re-applies our deployment-critical CF Gateway support adjacent to the new architecture so future syncs stay near-mechanical.

What's pulled from upstream

What we re-applied (deployment-critical, adjacent to upstream)

  • src/config.py: re-add LLMSettings.CF_GATEWAY_AUTH_TOKEN (single global needed for cf-aig-authorization header).
  • src/llm/registry.py: inject cf-aig-authorization header in get_openai_override_client / get_anthropic_override_client / get_gemini_override_client when base_url targets gateway.ai.cloudflare.com. Routes all CF-gateway-bound clients through the auth header — no parallel backend.
  • src/embedding_client.py: same header injection on the openai/gemini branches so embeddings through CF Gateway authenticate correctly.

The _cf_gateway_headers() helper is duplicated across src/llm/registry.py and src/embedding_client.py so the embedding client doesn't depend on the LLM runtime registry.

What we dropped (now redundant or replaceable)

  • Per-specialist override surface (DEDUCTION_PROVIDER / INDUCTION_PROVIDER / *_THINKING_BUDGET_TOKENS config fields + get_provider() / get_thinking_budget() methods on BaseSpecialist) — fully replaceable via upstream's DREAM_DEDUCTION_MODEL_CONFIG__TRANSPORT / __THINKING_BUDGET_TOKENS env vars on ConfiguredModelSettings.
  • src/utils/types.SupportedProviders — replaced by upstream's ModelTransport literal in src/config.py.
  • Custom Traefik service block in docker-compose.yml.example — example file now matches upstream defaults; configs in docker/traefik/ remain for users who want to wire it up.
  • Our 4e7f136 surprisal filter fix — byte-identical to upstream's fix(surprisal): use correct filter format for level observations plastic-labs/honcho#581, naturally converged.

Deployment migration notes — re-key the .env before deploying

Old vars are silently no-ops because of extra='ignore' on LLMSettings. Update the deployment .env to use upstream's per-component pattern:

Old New
LLM_OPENAI_BASE_URL=... <COMPONENT>_MODEL_CONFIG__BASE_URL=... (e.g. DIALECTIC_MODEL_CONFIG__BASE_URL)
LLM_CF_GATEWAY_API_KEY=... <COMPONENT>_MODEL_CONFIG__API_KEY=...
LLM_CF_GATEWAY_BASE_URL=... <COMPONENT>_MODEL_CONFIG__BASE_URL=... (CF gateway URL)
LLM_OPENAI_COMPATIBLE_*, LLM_VLLM_*, LLM_GROQ_API_KEY <COMPONENT>_MODEL_CONFIG__*
LLM_CF_GATEWAY_AUTH_TOKEN=... Unchanged — still the only header global
DREAM_DEDUCTION_PROVIDER=anthropic DREAM_DEDUCTION_MODEL_CONFIG__TRANSPORT=anthropic
DREAM_DEDUCTION_THINKING_BUDGET_TOKENS=2048 DREAM_DEDUCTION_MODEL_CONFIG__THINKING_BUDGET_TOKENS=2048

Local docker-compose.yml (untracked) is unaffected by the example file change.

Test plan

  • uv run ruff check src/ — passes
  • uv run basedpyright src/ — 0 errors, 2 pre-existing warnings unrelated to this merge
  • All conflict markers resolved; merge state clean
  • Smoke test: from src.llm import registry; from src.dreamer import specialists; ... imports OK
  • Full pytest suite (recommend running in CI)
  • Smoke test deploy: bring up local Docker stack with re-keyed .env, hit /v1/peers/{id}/chat to verify CF Gateway path still works end-to-end

erosika and others added 30 commits March 17, 2026 15:18
New integration guide for the sillytavern-honcho extension covering
install, global config, context architecture, enrichment modes, and
troubleshooting. Added to v3 integrations nav.
…#495)

* chore: add .worktrees/ to .gitignore

* feat(examples): add Zo Computer memory skill integration

* feat(examples): add Zo Computer memory skill integration

* fix(examples): address CodeRabbit review on Zo skill integration

  - Fix version inconsistency: SKILL.md matches pyproject.toml (>=2.1.0)
  - Move client.py into tools/ package and use relative imports
  - Add assistant_id parameter to save_memory() for consistency with get_context()
  - Use UUID-based IDs in tests to prevent state leakage between runs
  - Add pytest.mark.skipif guard on integration tests (requires HONCHO_API_KEY)
  - Fix import ordering, move pytest to module level, sort __all__ alphabetically
  - Fix markdown blank lines around fenced code blocks (MD031)
  - Add rate limit delay fixture to avoid hitting Honcho free tier limits

* fix(examples): validate HONCHO_API_KEY early in client initialization

* docs(examples): note cross-peer memory behavior in shared workspaces

* docs(examples): fix save_memory and query_memory signatures in README

* docs(examples): fix markdown linting issues in README

* docs(examples): add assistant_id parameter to save_memory example in
  SKILL.md

---------

Co-authored-by: Luba Kaper <lubakaper@lubas-air.mynetworksettings.com>
…fig guide (plastic-labs#510)

* fix: Inconsistencies in Docs, health endpoint, troubleshooting guide

* fix: (docs) maintain consistency on postgres db name

* chore: (docs) update v2 contributing docs with updates db paths

* docs: overhaul self-hosting docs for provider-agnostic setup

- .env.template: lead with provider options (custom, vllm, google,
  anthropic, openai, groq) instead of baking in vendor-specific keys.
  All provider/model settings commented out so server fails fast until
  configured. Separate endpoint config from per-feature provider+model
  from tuning knobs.
- docker-compose.yml.example: fix healthcheck -d honcho -> -d postgres
  to match POSTGRES_DB=postgres.
- config.toml.example: reorder and document LLM key section with
  OpenRouter and vLLM examples.
- self-hosting.mdx: replace multi-vendor key table with provider options
  table. Add examples for OpenRouter, vLLM/Ollama, and direct vendor
  keys. Remove duplicated key lists from Docker/manual setup sections.
- configuration.mdx: replace scattered provider docs with provider types
  table. Fix Docker Compose snippet to match actual compose file. Note
  code defaults as fallback, not recommended path.
- troubleshooting.mdx: add alternative provider issues section (custom
  provider config, model name format, Docker localhost, structured
  output failures).

* docs: add Docker build troubleshooting for permission errors

- Document BuildKit requirement (RUN --mount syntax)
- AppArmor/SELinux blocking Docker builds on Linux
- Volume mount UID mismatch between host and container app user
- Note in self-hosting docs that Docker path builds from source

* docs: reframe self-hosting as contributor/dev path, point to cloud service

* Revert "docs: reframe self-hosting as contributor/dev path, point to cloud service"

This reverts commit 3e766eb.

* docs: add production compose, model guidance, thinking budget docs

- Add docker-compose.prod.yml for VM/server deployment: no source
  mounts, restart policies, 127.0.0.1-bound ports, cache enabled
- Add model tier guidance and community quick-start link to self-hosting
- Document THINKING_BUDGET_TOKENS gotcha for non-Anthropic providers
- Add reverse proxy examples (Caddy + nginx) to production section
- Add backup/restore commands to production considerations

* docs: simplify self-hosting to single provider, restructure config guide

Self-hosting page now defaults to one OpenAI-compatible endpoint
with one model for all features. Moved model tiers, alternative
providers, and per-feature tuning into the configuration guide.
Eliminated duplicate config priority sections, dev/prod split,
and redundant TOML examples.

* docs: merge compose files, restore provider/model to feature sections in .env.template

Single docker-compose.yml.example with dev sections commented out.
Moved PROVIDER and MODEL back alongside each feature in .env.template
so settings stay colocated with their module. Updated self-hosting
docs to reference single compose file.

* fix: broken anchor links, redundant migration step, minor inconsistencies

Fix 4 broken internal links (#llm-provider-setup, #llm-api-keys,
#which-api-keys-do-i-need, #alternative-providers) to point to
correct headings. Remove redundant Docker migration step (entrypoint
already runs alembic). Fix cache URL missing ?suppress=true in
reference config. Fix uv install command to use official method.

* docs: env template ready to use, simplify self-hosting flow

.env.template now has provider/model lines uncommented with
placeholder values — user just sets endpoint, key, and model name.
Thinking budgets default to 0 for non-Anthropic providers.

Self-hosting page: removed 30-line env var wall, LLM setup now
points to the template. Merged duplicate verify sections.
Removed api_key from SDK examples (auth off by default).

* docs: reorder next steps, configuration guide first

* fix: default embedding provider to openrouter for single-endpoint setup

Without this, embeddings default to openai which requires a separate
LLM_OPENAI_API_KEY. Setting to openrouter routes embeddings through
the same OpenAI-compatible endpoint as everything else.

* fix: review issues — hermes page, thinking budget, production wording

Hermes integration page: replaced inline Docker/manual setup with
link to self-hosting guide, added elkimek community link. Removed
old env var names (OPENAI_API_KEY without LLM_ prefix).

Troubleshooting: removed "or 1" from thinking budget guidance.
Self-hosting: softened "production-ready" to "production-oriented"
since auth is disabled by default.

* docs: model examples in template, expanded LLM setup, better verify flow

.env.template: added "e.g. google/gemini-2.5-flash" hints next to
model placeholders so users know the expected format.

Self-hosting: expanded LLM Setup to show the 3 things users need to
set (endpoint, key, model name) with find-replace tip. Added build
time note, deriver log check, and real smoke test (create workspace)
to verify section. Health check now notes it doesn't verify DB/LLM.

* fix: smoke test uses v3 API path, not v1

* docs: clarify deriver metrics port vs Prometheus host port

* fix: remove deprecated memoryMode from hermes config example

* docs: update hermes page to match current memory provider config

Updated config to match hermes-agent docs: removed apiKey (not needed
for self-hosted), added hermes memory setup CLI command, added config
fields table (recallMode, writeFrequency, sessionStrategy, etc.).

Better verification tests: store-and-recall across sessions, direct
tool calling test. Links to upstream hermes docs for full field list.

* fix: invalid THINKING_BUDGET_TOKENS=0 and missing docker/ in image

Comment out THINKING_BUDGET_TOKENS=0 in .env.template — deriver,
summary, and dream validators require gt=0. Dialectic levels also
commented out since non-thinking models don't need the override.

Add COPY for docker/ directory in Dockerfile so entrypoint.sh is
available when docker-compose.yml.example references it.

* chore: Additional troubleshooting step

---------

Co-authored-by: Vineeth Voruganti <13438633+VVoruganti@users.noreply.github.com>
* fix: further remove extraneous transactions

* fix: (search) use 2 phase function to reduce un-needed transaction

* fix: refactor agent search to perform external operations before making a transaction

* fix: reduce scope of queue manager transaction

* fix: (bench) add concurrency to test bench

* fix: address review findings for search dedup, webhook idempotency, and bench throttling

* Fix Leakage in non-session-scoped chat call (plastic-labs#526)

* fix: (search) reduce scope for peer based searches

* fix: tests

* fix: (test) address coderabbit comment

* fix: drop db param from deliver_webhook

---------

Co-authored-by: Rajat Ahuja <rahuja445@gmail.com>
* chore: (docs) Update changelogs and version numbers

* chore: remove extraneous dep on mintlify
* Simplify Paperclip integration instructions

Clarified instructions for local Honcho setup and removed unnecessary details.


* Update docs.json

* Update links in Paperclip integration guide

* Revise memory initialization instructions in Paperclip guide

Updated instructions for initializing memory and removed optional checks section.
…ic-labs#530)

The HEALTHCHECK directive probes an HTTP endpoint that only the API
serves. The deriver service reuses this image but is a background queue
worker with no HTTP server — the probe can never succeed, so Docker
permanently marks the deriver container as unhealthy.

Remove the HEALTHCHECK from the shared image. Service-level health
checks belong in each service's own configuration (e.g. Kubernetes
readiness/liveness probes on the API Deployment only).

Closes plastic-labs#521

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ation (plastic-labs#459)

* fix: Add JSON repair for truncated LLM responses across all providers and Gemini thinking budget support

LengthFinishReasonError from OpenAI-compatible providers (custom, openai, groq) was crashing the deriver
with 14k+ occurrences in production. The vLLM path already had repair logic but it was gated on
provider=="vllm", unreachable when routing through litellm as a custom provider.

- Extract shared _repair_response_model_json() helper for all providers
- Catch LengthFinishReasonError in OpenAI/custom parse() path and repair truncated JSON
- Add repair fallback to Anthropic and Gemini response_model paths
- Add repair fallback to Groq response_model path
- Pass thinking_budget_tokens to Gemini 2.5 models via thinking_config
- Add 14 tests covering repair paths for all providers and Gemini thinking budget

Fixes HONCHO-YC

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: live llm integration tests

* feat: Consistent Model Config Protocol

* fix: migrate the remaining app callers off the legacy llm_settings path

* fix: Docs and regression tests

* fix: refactor llm runtime path to model-config-only API

* fix: refactor config to nested model-config source of truth

* fix: refactor llm streaming and tool dispatch through backends

* fix: cut over llm config to nested model_config only

* fix: collapse vllm and custom into openai_compatible transport

* feat: refactor llm config to explicit transports and bare model ids

* feat: (embed) Add configurability for embedding model

* fix: tests for embedding provider

* fix: Address Review Comments

* fix: (llm) remove Groq backend and per-vendor base URLs

* chore: move llm tests

* fix: (llm) address review findings — config regressions, backend bugs, dead code

* fix: address backend end silly errors

* chore: (docs) update configuration and self-hosting guides

* chore: fix tests

* fix: address code rabbit comments

* fix: add validation to the dream settings

* fix: further address code rabbit comments

* fix: Address Code Rabbit Comments

* fix: Another round of code rabbit

* fix: Address Code Rabbit Nits

* fix: tests

* refactor: rename thinking validator to reflect transport scope

_validate_anthropic_thinking_minimum only enforces the >=1024 rule for
Anthropic and no-ops for other transports, so the name was misleading
now that it's shared across ConfiguredModelSettings, FallbackModelSettings,
and ModelConfig. Renamed to _validate_thinking_constraints with a docstring
clarifying per-transport behavior. No logic change.

* fix(config): drop transport-specific thinking params when env override changes transport

_fill_defaults_for_nested_field previously preserved the default MODEL_CONFIG's
thinking_budget_tokens/thinking_effort across a transport override. This leaked
Gemini-family defaults (e.g. thinking_budget_tokens=1024) into OpenAI-transport
overrides, and the OpenAI backend then correctly rejected the unsupported param
at call time (OpenAI uses reasoning.effort, not a token budget).

The helper now strips thinking_budget_tokens and thinking_effort from the
default dict when the env override supplies a transport different from the
default's. Explicit thinking params in the override are preserved.

* fix(config): apply thinking-param strip to dialectic level merge too

DialecticSettings._merge_level_defaults does its own inline MODEL_CONFIG
merge (parallel to _fill_defaults_for_nested_field), so the previous fix
missed dialectic-level overrides. E.g. flipping
DIALECTIC_LEVELS__minimal__MODEL_CONFIG__TRANSPORT from gemini (default)
to openai still leaked the default thinking_budget_tokens=0 into the
openai config, which the OpenAI backend then rejected at call time.

The level-merge path now applies the same 'strip transport-specific
thinking params when transport changes' rule as the generic helper.
Added a regression test exercising the merge validator directly.

* refactor(llm): wire ModelConfig knobs through, prune clients.py migration leftovers

Three connected fixes to finish carving the LLM stack out of src/utils/clients.py
and into src/llm/:

1. Propagate ModelConfig tuning knobs into backend calls.
   honcho_llm_call_inner built extra_params from only {json_mode, verbosity},
   silently dropping top_p, top_k, frequency_penalty, presence_penalty, seed,
   and operator-supplied provider_params from any ModelConfig. Thread the
   selected config through ProviderSelection and merge
   build_config_extra_params(selected_config) into extra_params; per-call
   kwargs still win over provider_params defaults. Makes
   _build_config_extra_params public as build_config_extra_params so
   clients.py and request_builder.py share one translation. Adds
   TestModelConfigExtraParamsPropagation covering OpenAI/Anthropic knob
   propagation, provider_params passthrough, and per-call override
   precedence.

2. Drop dead extract_openai_* duplicates in clients.py.
   extract_openai_reasoning_content, extract_openai_reasoning_details, and
   extract_openai_cache_tokens had no callers outside their own definitions
   — the live implementations live in src/llm/backends/openai.py. -103
   lines from clients.py.

3. Unify on ModelTransport, delete SupportedProviders.
   The "google" vs "gemini" split forced a _provider_for_model_config
   translation shim in two places. Replace all SupportedProviders usages
   with ModelTransport, rename CLIENTS["google"] → CLIENTS["gemini"],
   update provider branches + LLMError labels + reasoning-trace entries
   accordingly. Trace JSONL now writes "provider": "gemini" instead of
   "google" — consistent with the broader env-var rename cutover.

Also tidies up pre-existing basedpyright findings in tests/llm/test_model_config.py
(pydantic before-validator dict inputs + descriptor-proxy call).

ruff: clean. basedpyright: 0 errors, 0 warnings. Tests: 153/153 pass across
tests/utils/test_clients.py, tests/utils/test_length_finish_reason.py,
tests/llm/, tests/dialectic/, tests/deriver/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(llm): finish the src/utils/clients.py → src/llm/ migration

honcho_llm_call_inner now delegates to request_builder.execute_completion
and execute_stream instead of re-implementing backend call scaffolding
inline. The new _effective_config_for_call helper carries per-call kwargs
(temperature, stop_seqs, thinking_budget_tokens, reasoning_effort) onto
the selected ModelConfig — or synthesizes a minimal config for the
test-only callers that pass provider+model directly. max_output_tokens
is zeroed on the effective config to preserve the current
"per-call max_tokens wins" semantic; honoring ModelConfig.max_output_tokens
is a separable correctness concern.

Side effect of routing through the new path: ConfiguredModelSettings'
thinking_budget_tokens validator now fires on synthesized configs.
test_anthropic_thinking_budget was asserting that a sub-1024 budget
propagated to Anthropic — bumped to 1024 to match what Anthropic actually
accepts.

Unified client construction. Promoted the cached client factories in
src/llm/__init__.py (get_anthropic_client, get_openai_client,
get_gemini_client, get_{anthropic,openai,gemini}_override_client) to
public API and added them to __all__. Promoted
credentials._default_transport_api_key → default_transport_api_key.
Deleted the duplicate _build_client and _default_credentials_for_provider
from clients.py; _client_for_model_config now falls through to the
public factories. CLIENTS dict and _get_backend_for_provider stay as the
mockable seam for the ~50 patch.dict(CLIENTS, {...}) test call sites.

Wired operator-configurable Gemini cached-content reuse end-to-end.
PromptCachePolicy moved from src/llm/caching.py into src/config.py so
ModelConfig can reference it as a field without a circular import;
caching.py re-exports the name for existing imports. Added
cache_policy: PromptCachePolicy | None on ConfiguredModelSettings,
FallbackModelSettings, ResolvedFallbackConfig, and ModelConfig.
resolve_model_config, _resolve_fallback_config, and
_select_model_config_for_attempt copy the field through.
honcho_llm_call_inner passes effective_config.cache_policy into
execute_completion / execute_stream, so operators opt in via
e.g. DERIVER_MODEL_CONFIG__CACHE_POLICY__MODE=gemini_cached_content
and the selection actually fires instead of sitting on a dead path.

New regression test test_cache_policy_reaches_gemini_backend asserts the
PromptCachePolicy object reaches the Gemini backend's extra_params.

ruff + basedpyright: clean. Tests: 154/154 pass across
tests/utils/test_clients.py, tests/utils/test_length_finish_reason.py,
tests/llm/, tests/dialectic/, tests/deriver/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(llm): move all LLM orchestration into src/llm/ and delete clients.py

The 1624-line src/utils/clients.py has been carved up into focused modules
under src/llm/ and deleted. There is now one golden path for LLM
orchestration and no dual entrypoint.

New module layout:

  src/llm/
    __init__.py       thin stable re-export surface
    api.py            public honcho_llm_call with retry + fallback + tool
                      loop delegation
    executor.py       honcho_llm_call_inner (single-call executor); bridges
                      to request_builder.execute_completion / execute_stream
    tool_loop.py      execute_tool_loop + stream_final_response, plus
                      assistant-tool-message and tool-result formatting
    runtime.py        AttemptPlan dataclass (replaces the loose
                      ProviderSelection NamedTuple), effective_config_for_call,
                      plan_attempt, per-retry temperature bump, attempt
                      ContextVar
    registry.py       single owner of CLIENTS dict + cached default and
                      override SDK-client factories + backend/history-adapter
                      selection + high-level get_backend(config)
    conversation.py   count_message_tokens, tool-aware message grouping,
                      truncate_messages_to_fit
    types.py          HonchoLLMCallResponse, HonchoLLMCallStreamChunk,
                      StreamingResponseWithMetadata, IterationData,
                      IterationCallback, ReasoningEffortType, VerbosityType,
                      ProviderClient
    request_builder.py low-level request assembly (ModelConfig → backend
                      complete/stream); no longer owns credential resolution
    credentials.py    default_transport_api_key, resolve_credentials
    caching.py        gemini_cache_store; re-exports PromptCachePolicy
                      from src.config
    backend.py        Protocol + normalized result types
    history_adapters.py provider-specific assistant/tool message shapes
    structured_output.py
    backends/         AnthropicBackend, OpenAIBackend, GeminiBackend

handle_streaming_response had no production callers; it is deleted. The
three tests that used it now drive honcho_llm_call_inner(stream=True,
client_override=...) directly, which exercises the same code path the
public API uses.

Dead credential passthrough removed. The ProviderBackend Protocol and
all three concrete backends no longer accept api_key / api_base — those
are baked into the underlying SDK client at registry construction time
and were being del'd everywhere they appeared. request_builder also
stops resolving and forwarding them.

Client construction is unified. The cached default-client factories
(get_anthropic_client, get_openai_client, get_gemini_client) and override
factories (get_*_override_client) are promoted to public API; the
module-level CLIENTS dict populates from them and remains the
patch.dict(CLIENTS, {...}) mocking seam tests rely on. Old duplicate
helpers (_build_client, _default_credentials_for_provider) are gone.
default_transport_api_key is promoted to public.

Application imports now come from src.llm (dreamer, dialectic, deriver,
summarizer, telemetry-adjacent tests). No code imports from
src.utils.clients anywhere in the repo.

ruff: clean. basedpyright: 0 errors, 0 warnings. Tests: 1013/1013 pass
across the entire non-infra test suite (excluding tests/unified,
tests/bench, tests/live_llm, tests/alembic).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(llm): sanitize tool schemas for Gemini's function_declarations validator

Gemini's native-transport function-declarations validator accepts a narrow
subset of JSON-Schema / OpenAPI: type, format, description, nullable, enum,
properties, required, items, minItems, maxItems, minimum, maximum, title.
Anything else — additionalProperties, allOf, if/then/else, $ref, anyOf,
oneOf, $defs, patternProperties — triggers an INVALID_ARGUMENT 400 at call
time.

Our agent tool schemas in src/utils/agent_tools.py use several of those
(additionalProperties: false, allOf + if/then conditionals) because they
were authored for OpenAI strict-mode + Anthropic, which need the richer
vocabulary. GeminiBackend._convert_tools was passing them straight through.

Add _sanitize_schema(): walks the parameters tree and drops unsupported
keywords while preserving semantics for the keywords that hold user data
(properties maps field-name → sub-schema; required / enum are lists of
literals; items is a single sub-schema). Other backends are untouched and
continue to receive the full strict schemas.

Regression tests:
- test_gemini_sanitize_schema_strips_unsupported_keywords: confirms
  additionalProperties, allOf + if/then, and $defs are stripped at nested
  levels while legitimate fields survive.
- test_gemini_convert_tools_sanitizes_parameters_schema: end-to-end
  _convert_tools output has no forbidden keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: fix tool calling syntax for gemini

* refactor(llm): normalize defaults, widen OpenAI reasoning-model routing

* chore: fix test

* fix(llm): address post-migration review feedback

* fix(llm): gemini robustness + dreamer specialist ergonomics

* chore: addres review comments

* chore: (docs) unrelease changelog addition

* chore: (docs) merge commit changes

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Erosika <eri@plasticlabs.ai>
* feat: adding honcho-cli package

* feat: adding more support for command-level flags, also including workarounds for getting raw SDK info

* feat: adding peer config

* feat: adding setup commands

* chore: setting up package dependencies for cli

* feat: promote init/doctor to top-level + polish wizard

* feat: make init --yes fall back to existing config

* chore: updating documentation

* chore: updating tagline

* feat: structurally updating recomended settings for CLI

* fix: style

* fix: removing redundant describe method

* fix: delete key generation commands and fixing session ID

* fix: removing defaults and changing config write path.

* chore: pagnating conclusions

* chore: require workspace

* fix: polish command surfaces — scoping, validation, perf, consistency

* chore: removing session message

* fix: CLI output shape, destructive-confirm previews, skip needless round-trips

* chore: CLI polish — peer inspect config, drop dead helper, doc/help consistency

* chore: update readme

* chore: updating tests

* chore: doc updates

* fix: config command

* chore: unused code

* fix: doctor command

* fix: removing quiet tag and fixing session key ordering

* fix: config commands and session id command

* fix: removing message_count

* fix: branding circular dependency

* fix: refactor lazy imports to use common.py correctly.

* fix: removing all lazy imports

* chore: cr fixes

* fix: config, env, flag setup

* chore: updating skill

* feat: adding workspace, session, and message create

* fix: init now supports local honcho

* chore: cr

* feat(cli): CLI surface polish — reasoning flag, peer-scoped messages, help sync

Add --reasoning/-r to peer chat (minimal..max), -p peer filter to
message list with newest-first ordering, and a curated welcome panel
with getting-started/memory/commands sections.

Sync the welcome panel and group help strings with the actual
registered commands — drop phantom 'session clone', add the 4 missing
peer commands and 7 missing session commands, fix conclusion/message/
workspace group docstrings that claimed commands that don't exist.

* feat(cli): themed, unified help system with pattern/example

Replace the hand-rolled welcome with a layered system:

- Theme typer.rich_utils (dim borders, brand color) so every --help
  inherits the voice.
- HonchoTyperGroup subclass renders a curated 3-panel welcome
  (getting started / memory / commands) with recipes Typer can't
  auto-generate.
- Unify the front door: bare 'honcho', 'honcho --help', and
  'honcho help' all render the same welcome via one code path;
  sub-groups and leaf commands still get Typer's themed renderer.
- Replace Click's 'Usage: …' line with pattern/example rows at every
  sub-group and leaf command, so the help voice stays consistent from
  top to leaves.

* refactor(cli): address review — typed exceptions, chmod 600, tighter redaction, class-based help, tests

- Replace module-level monkey-patch of TyperGroup/TyperCommand.get_usage
  with HonchoTyperGroup applied via cls= on every sub-Typer. Lives in
  a new _help.py module to avoid circular imports. No longer leaks
  behavior changes into other Typer users in the same process.
- _test_connection dispatches on the SDK's typed exceptions
  (AuthenticationError, ConnectionError, TimeoutError, APIError)
  instead of substring-matching error messages.
- Config.save() now chmods ~/.honcho/config.json to 0o600 after write
  so the plaintext API key isn't world-readable on multi-user hosts.
- Tighten api_key redaction to '***<last4>' (was 'header...last4'),
  matching setup._redact for consistency. Short keys fully masked.
- Add test_validation.py covering safe IDs, unsafe chars, path
  traversal, and empty input. Update test_config.py redaction cases
  and add 0o600 permission assertion. Fix stale patch paths in
  test_commands.py that pointed at honcho_cli.main instead of the
  command modules where get_client is actually imported.

* feat(cli): add options panel to welcome menu

Append a fourth panel listing the global flags (-w/-p/-s, --json,
--version, --help) with their env-var counterparts. Discoverable
from bare 'honcho' without needing to hunt for --help.

* chore(cli): drop --version from welcome options panel

* feat(cli): add pixel-honcho icon to banner

Prepend a 13-char ASCII rendering of honcho-pixel.svg to the HONCHO
wordmark. Uses Unicode half-blocks to pack 12 pixel rows into 6 text
rows, faithfully preserving the SVG outline (two eye dots, mouth slit,
tapering foot). Appears in bare 'honcho', 'honcho --help', 'honcho
--version', and 'honcho init'.

* fix: polish Honcho CLI wolcome panel and error messages

* fix: honcho workspace inspect speed

* chore: minor fix to session pagination

* fix: removing NDJSON output

* chore: consolidating honcho CLI's dula argv grammar onto Pattern A (command-first)

* chore: clean up imports

* fix: four `-s` consistency fixes applied

* chore: minor changes to memory rows

* fix: changing package name to honcho-cli

* fix: removing pixel face

---------

Co-authored-by: Erosika <eri@plasticlabs.ai>
…lastic-labs#575)

The MCP Worker hardcoded https://api.honcho.dev for every request, forcing
anyone running a self-hosted Honcho instance to patch the source before
deploying their own Worker alongside it.

Route the baseUrl through the Worker env so operators can set
HONCHO_API_URL (via .dev.vars for local development or wrangler secret for
deployed Workers) and point the Worker at their instance. The variable is
intentionally not exposed as a request header: that would let public
clients steer traffic to internal URLs, which is a latency and security
regression.

When HONCHO_API_URL is unset, the Worker falls back to
https://api.honcho.dev, so existing deployments are unaffected.

Closes plastic-labs#508
…patible providers (plastic-labs#586)

* fix: wrap single embed() input in array for OpenAI-compatible provider compatibility

* Fix input format in embedding test assertion
* fix: catch InternalServerError from turbopuffer

* fix: remove unused VectorUpsertResult

* fix: downgrade vector store sync errors to warnings

* fix: remove upsert_with_retry

* fix: (vector) add silent path and explicit path for vector db server errors

---------

Co-authored-by: Vineeth Voruganti <13438633+VVoruganti@users.noreply.github.com>
* docs: adding cli doc

* docs: adding generated script and content and github workflow

* chore: removing workflow

* fix: (docs) re-format and add details to cli-reference docs

---------

Co-authored-by: Vineeth Voruganti <13438633+VVoruganti@users.noreply.github.com>
* fix: moving cli skills to root

* chore: updating cli readme

* chore: updating language

* chore: updating docs
Applies eight review findings from the DEV-1482 integration review. All
scoped to docs/v3/guides/integrations/sillytavern.mdx; no code changes.

- DOC-3: curl -fsSL in install command (fails loud on 4xx/5xx)
- DOC-4: Note now reflects installer auto-config + manual-fallback
- DOC-6: LLM-backend prerequisite callout at top of Quick Start
- DOC-14: restart step warns about live-session clobbering
- DOC-5: Global Config intro names resolution order + precedence;
  disambiguates "sillytavern" workspace vs hosts.sillytavern key
- DOC-7: new Peer Observability subsection (asymmetric default)
- DOC-2: route count in Architecture diagram 7 → 9
- DOC-8: troubleshooting row for "plugin on disk, drawer absent"

Findings index + rationale: plastic-labs/sillytavern-honcho#3
- Clarify installer step 4 — the plugin seeds config.json if absent
- 'Puzzle piece' -> 'three-cubes' for the Extensions icon (current ST UI)
- API key step notes the UI-overrides-config precedence explicitly
- 'Honcho workspace ID' -> 'default Honcho workspace ID (configurable)'
- Add Note after Context-modes table — Context only is session-scoped
  and returns empty until enough messages accumulate; Reasoning is the
  better default for fresh peers
- Next Steps gains two cards: Install SillyTavern (upstream docs) and
  the Claude Code setup skill (skills/setup/SKILL.md)

Follow-ups tracked separately — tool rename (observation -> conclusion,
matching the /conclusion endpoint), architecture Excalidraw.
…ig (plastic-labs#587)

* Update deriver.py

* Simplify model configuration in deriver.py

Removed stop_sequences from model configuration.
- Add Prerequisites section with SillyTavern install link + Node >= 18
  requirement (was buried in Next Steps; users hit install step with no
  awareness ST needed to exist first).

- Expand restart step into a callout: restart required for server-plugin
  reload, not for client-side edits.

- Configure step now documents the three editable inputs (API key,
  Workspace ID, Your peer name) and where each saves.

- Fix 'three-cubes icon' -> 'puzzle piece icon'.

- Installer step list fleshed out: 6 steps (was 4), including config.yaml
  bootstrap and enableServerPlugins flip. Dropped the false claim that
  the plugin seeds a minimal ~/.honcho/config.json on first run.

- Global Config section rewritten: resolution order now generalized to
  apiKey / workspace / peerName (was apiKey-only); documents panel
  write-back to hosts.sillytavern.*; dropped aiPeer references (it's a
  telemetry-only field, not user-facing).

- Add a Disable / Enable global config subsection covering the opt-out
  toggle and the Inherit / Push local / Cancel diff dialog.

- Troubleshooting: two new rows (stale peer name on new chat, cancelled
  diff dialog).
The plugin also writes to a root-level `sessions` map (ST dir → last
Honcho session ID), not only to `hosts.sillytavern.*`. The earlier
phrasing overstated the isolation claim.
honcho_save_observation is not registered in the extension — only
honcho_query_memory and honcho_search_history exist in code.
* docs: adding opencode

* docs: align opencode guide with latest plugin changes

* chore: updating language

---------

Co-authored-by: adavyas <adavyasharma@gmail.com>
- New Group Chats subsection: documents per-character peer routing
  (each group member gets their own peer, not a collapsed group-<id>
  peer) and lazy peer registration for characters joining mid-chat.
- Session Naming: documents the freeze-on-first-assign invariant
  (changing the naming mode doesn't reroute existing chats) and
  the Reset button for explicit session rollover.
- Tool table: add honcho_save_conclusion — prior fix undercounted
  (2 -> 3 tools). The extension registers all three.
…stic-labs#581)

The Surprisal module passes `{"level": levels}` directly to
`get_all_documents()`, but `apply_filter()` expects operator syntax:
`{"level": {"in": levels}}`.

Without the `in` operator, the filter is silently ignored, causing
`_fetch_level_observations()` to return 0 results. This makes the
entire Surprisal phase of the Dream cycle a no-op.

Fixes plastic-labs#559
erosika and others added 19 commits April 23, 2026 15:56
* docs: adding opencode

* docs: align opencode guide with latest plugin changes

* chore: updating language

* docs: remove interview command from opencode guide

---------

Co-authored-by: ajspig <dragon@monstercode.com>
* docs: update opencode install command

* docs: use native opencode plugin install
…s#615)

* fix(deriver): ignore blank observations before embedding

* Address PR review on observation normalization

* Harden mock await arg access in tests

* Unify blank observation filtering across tool paths

* Move soft-delete query test back to fixture class
* fix(dreamer): threshold and time-guard semantics

Finding 2: filter count_stmt on documents.level == 'explicit' in
check_and_schedule_dream. Dreamer-created levels (deductive, inductive,
contradiction) are consolidation output, not input, and would otherwise
inflate the threshold count and create a feedback loop.

Finding 3 (code-level): relocate last_dream_at write from enqueue_dream
(enqueue.py) to process_dream (orchestrator.py), inside the
'if result is not None' block. Duplicate enqueues can no longer reset
the 8-hour time guard clock. Failed/never-run dreams don't advance it.

Success criteria: lenient (any non-null DreamResult counts). Pending
Vineeth confirmation — will adjust to strict/middle if requested.

Tests pending in follow-up commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(dreamer): threshold filter + last_dream_at relocation regression tests

Tests for Finding 2 and Finding 3 (code-level):

- TestThresholdFilter (tests/dreamer/test_dream_scheduler.py):
  * Mixed levels below explicit threshold: 30 explicit + 40 deductive
    + 10 inductive → no trigger (core regression, buggy count would trigger)
  * Explicit-only at threshold: 60 explicit → triggers
  * Contradiction excluded: 100 contradiction + 10 explicit → no trigger
    (confirms positive == "explicit" filter excludes all dreamer output)

- TestEnqueueDreamMetadataShape (tests/deriver/test_enqueue_dream.py):
  * AsyncMock-patched update_collection_internal_metadata verifies
    enqueue writes last_dream_document_count but NOT last_dream_at

- TestLastDreamAtCompletionWrite (tests/dreamer/test_dreamer_integration.py):
  * Happy path: run_dream returns DreamResult → last_dream_at written
  * Failure path: run_dream returns None → last_dream_at absent
  * Exception path: run_dream raises → last_dream_at absent,
    process_dream swallows exception (queue-processed semantics preserved)

Docstring on check_and_schedule_dream tightened: "document threshold"
-> "explicit-observation threshold" to reflect filter semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(dreamer): preserve last_dream_document_count in completion write

CodeRabbit caught this: update_collection_internal_metadata uses a
top-level JSONB `||` merge, so passing {"dream": {"last_dream_at": ...}}
replaces the entire "dream" subkey and drops last_dream_document_count
that was written by enqueue_dream.

Symptom: after every completed dream, the baseline drops to 0. Next
check_and_schedule_dream reads documents_since_last_dream as
current_count - 0 = current_count, so any collection with >= 50
explicit observations can re-trigger immediately once the 8h guard
expires, even with no new raw material.

Fix: read-modify-write. Fetch current collection, merge last_dream_at
into the existing "dream" dict, write the merged dict back. Preserves
sibling keys (current: last_dream_document_count; future-proof for
telemetry fields that might land in PR 4).

Regression test added to tests/dreamer/test_dreamer_integration.py:
pre-seeds {"dream": {"last_dream_document_count": 42}}, runs
process_dream, asserts both last_dream_at is written AND
last_dream_document_count == 42 is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(dreamer): address CodeRabbit feedback on b89997c

- enqueue.py: read-modify-write preserves last_dream_at when writing baseline
- dream_scheduler.py: explicit-level filter on execute_dream count query
- test fixture: pin DOCUMENT_THRESHOLD and ENABLED_TYPES for stability
- integration test: timezone-aware assertion on last_dream_at

Regression test added for enqueue sibling-drop (symmetric to c8fe40a).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(dreamer): session lookup symmetry + row lock on dream metadata RMW

- dream_scheduler.py: explicit-level filter on execute_dream session lookup
  (baseline and session pick must agree on the same document set)
- crud.collection.get_collection: optional with_for_update flag for callers
  that need serialized read-modify-write on internal_metadata
- enqueue.py + orchestrator.py: pass with_for_update=True on the RMW reads
  to close the TOCTOU between concurrent enqueue and completion writes

Follow-up filed for jsonb_set-based nested updates (docs/factory/backlog/).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(dreamer): explicit-only count on manual schedule_dream route

The third caller of enqueue_dream — POST /workspaces/{id}/schedule_dream —
was passing an all-levels document count as the baseline, breaking symmetry
with check_and_schedule_dream and execute_dream after Loop 2's filter fixes.
Filter the manual route's count to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(dreamer): document explicit-only invariant on enqueue_dream.document_count

Loop 3 follow-up on d76627a. The parameter's semantic tightened across Loop
2 (check_and_schedule_dream, execute_dream) and Loop 3 (schedule_dream route)
to "explicit-level count, used as the baseline," but the signature still read
"Current document count for metadata update." The next caller would have no
way to know from the function contract.

Docstring now spells out: (1) the value is explicit-only, (2) it's written
as last_dream_document_count, (3) it's the baseline that
check_and_schedule_dream subtracts from to compute
documents_since_last_dream, (4) passing a count that includes non-explicit
levels (deductive, inductive, contradiction) inflates the baseline and
suppresses the next scheduled dream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(dreamer): rename current_document_count → current_explicit_count

Loop 3 follow-up on d4e10e3. After Loop 2's filter landed, the local in
check_and_schedule_dream held an explicit-only count but was still named
current_document_count — asymmetric with execute_dream's current_explicit_count
(line 201) and contradicting the filter on line 269 that produces the value.

Pure rename: three occurrences (definition at 271, subtraction at 274, log
extra key at 282). No test references. Naming-as-invariant alignment with
d76627a (query filters), d4e10e3 (parameter docstring), and Loop 1's local
rename in execute_dream.

The persisted JSONB key last_dream_document_count is the one remaining
drift-layer; filed as plastic-claudebook backlog item for a separate PR
with an intentional migration path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(dreamer): atomic guard-pair write + in-flight stampede defense

Loop 4 response to Vineeth's CHANGES_REQUESTED on PR plastic-labs#573.

The pre-Loop-4 enqueue-time write of last_dream_document_count was serving
double duty: rate limiter AND stampede latch. By arming the 8h guard the
moment a dream entered the pipeline, it implicitly blocked a second dream
from being scheduled during the in-flight window. Loop 3 relocated the
last_dream_at write to completion without moving its sibling baseline,
splitting the semantic pair and exposing the latch role that had lived
only in Vineeth's head.

Invariant (now pinned to check_and_schedule_dream's docstring): from the
moment a dream is scheduled until it completes or fails, no second dream
may be enqueued for the same (workspace, observer, observed) — and the
baseline count advances only when consolidation actually happened.

Changes:
- enqueue_dream: remove the last_dream_document_count write entirely and
  drop the document_count parameter. enqueue no longer touches dream
  metadata; the implicit stampede latch is replaced by an explicit
  queue-backed defense.
- process_dream: extend the existing row-locked RMW to write both guard
  fields atomically. Current explicit-doc count is recomputed inside the
  locked block (not carried on DreamPayload) so the pair reflects the
  actual consolidation moment.
- check_and_schedule_dream: query QueueItem for pending dreams on this
  collection's work_unit_keys (mirrors uq_queue_dream_pending_work_unit_key)
  before arming a timer. Uses queue state as source of truth rather than
  reflecting it into metadata.
- Tests: two new coherence tests under TestGuardPairCoherence —
  test_pending_queue_item_blocks_second_schedule walks the stampede timeline,
  test_silent_failure_allows_retry_on_same_corpus verifies failed dreams
  don't consume the baseline. Existing tests updated to the new contract.

* chore(dreamer): trim comment slop from loop-4 atomic pair work

Compress three verbose comments added in d24958d — the invariant itself
is captured in check_and_schedule_dream's docstring, so the inline
narrative restates what the code already says.

- dream_scheduler.py defense C block: 5 lines → 2
- orchestrator.py atomic pair write: 4 lines → 1
- enqueue.py docstring paragraph: 5 lines → 2

Net: +5/-14. Follows Eri's eef27be precedent on sillytavern-honcho PR plastic-labs#7.

---------

Co-authored-by: lilyplasticlabs <lily@plasticlabs.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adopts upstream's full LLM client refactor (PR plastic-labs#459: src/utils/clients.py
deleted in favor of the new src/llm/ package with per-backend handlers,
ConfiguredModelSettings, and ModelTransport). Conflict resolutions were
taken upstream-side via -X theirs and our customizations are re-applied
adjacent to the new structure rather than as parallel forks.

Notable upstream changes pulled in:
- LLM client refactor: src/llm/{api,backend,executor,runtime,tool_loop,...}
  with src/llm/backends/{anthropic,gemini,openai}.py
- ConfiguredModelSettings + ModelConfig replace per-component model fields
- New honcho-cli package, Zo Computer / Paperclip / SillyTavern / opencode
  integration docs
- Surprisal filter format fix (plastic-labs#581) — converged with our 4e7f136
- Many smaller fixes: dreamer thresholds, deriver blank-observation guard,
  vector sync retry budget, embed() string-input fix, etc.

Adjacent re-applications (deployment-critical):
- src/config.py: re-add LLM.CF_GATEWAY_AUTH_TOKEN
- src/llm/registry.py: inject cf-aig-authorization header in
  get_*_override_client factories when base_url targets a CF gateway
- src/embedding_client.py: same header injection on openai/gemini branches

Dropped (now redundant or replaceable):
- Per-specialist DEDUCTION_PROVIDER / INDUCTION_PROVIDER /
  *_THINKING_BUDGET_TOKENS overrides — covered by upstream's
  DREAM_DEDUCTION_MODEL_CONFIG__TRANSPORT etc. env vars
- get_provider() / get_thinking_budget() methods on BaseSpecialist —
  superseded by ConfiguredModelSettings on each specialist's MODEL_CONFIG
- src/utils/types.SupportedProviders — replaced by ModelTransport
- Custom Traefik service block in docker-compose.yml.example — Traefik
  configs remain in docker/traefik/ for users who want to wire it up
- Our 4e7f136 surprisal fix — identical to upstream's plastic-labs#581

Deployment notes for re-keying .env:
- LLM_CF_GATEWAY_API_KEY / LLM_CF_GATEWAY_BASE_URL / LLM_OPENAI_BASE_URL /
  LLM_OPENAI_COMPATIBLE_* / LLM_VLLM_* are no-ops now (extra='ignore'). Use
  per-component MODEL_CONFIG__BASE_URL / MODEL_CONFIG__API_KEY env vars
  (e.g. DREAM_DEDUCTION_MODEL_CONFIG__BASE_URL=...).
- LLM_CF_GATEWAY_AUTH_TOKEN remains as the single global needed for the
  cf-aig-authorization header.

Verification: ruff check src/ ✓, basedpyright src/ ✓ (0 errors).
…am merge

These edits should have been folded into the merge commit (a901f34) but were
left uncommitted — pushing now to actually deliver CF Gateway support and
clean up leftovers from the -X theirs auto-resolution.

src/config.py
  - Add LLMSettings.CF_GATEWAY_AUTH_TOKEN (single global needed for the
    cf-aig-authorization header on any provider override client whose
    base_url targets a CF gateway URL).

src/llm/registry.py
  - Inject cf-aig-authorization header in get_openai_override_client,
    get_anthropic_override_client, and get_gemini_override_client when
    base_url contains 'gateway.ai.cloudflare.com' AND
    LLM.CF_GATEWAY_AUTH_TOKEN is set. Rides on the existing openai/
    anthropic/gemini transports — no parallel CF backend.

src/embedding_client.py
  - Mirror the same header injection on the openai/gemini branches so
    embeddings through CF Gateway authenticate correctly. Helper is
    duplicated locally so the embedding client doesn't depend on the
    LLM runtime registry module.

src/dreamer/specialists.py
  - Drop get_provider() / get_thinking_budget() override methods on
    BaseSpecialist + the per-specialist references to settings.DREAM.
    DEDUCTION_PROVIDER / INDUCTION_PROVIDER / *_THINKING_BUDGET_TOKENS.
    Those settings fields no longer exist upstream — same functionality
    is reachable via DREAM_DEDUCTION_MODEL_CONFIG__TRANSPORT etc.
  - Drop the orphan thinking_budget_tokens=llm_settings.THINKING_BUDGET_TOKENS
    arg on the honcho_llm_call site that survived the auto-merge — the
    value now lives on model_config which is already passed.

src/main.py
  - ruff isort fix (autofixed) — uuid/time import order.

Verification: ruff check src/ ✓, basedpyright src/ ✓ (0 errors).
Telemetry-only signal: True when the loop exited via the max-iterations
synthesis path rather than the model deciding to stop. Distinguishes
"model didn't converge" from natural termination so downstream
observability can label the two cases differently.

No emitter changes — flag is set but no consumer reads it yet.
Adds six new metrics + recorder methods on the existing PrometheusMetrics
singleton; no callers yet, so this commit is purely declarative.

Series:
- llm_calls / llm_call_duration_seconds — counter + histogram per call,
  labeled by feature × provider × model × outcome.
- llm_tokens — input/output/cache_read/cache_creation per
  feature × provider × model.
- llm_tool_calls — per-tool invocation outcome inside the tool loop.
- llm_iterations — histogram of iterations consumed per call/outcome.
- llm_backup_used — counts failovers from primary to backup provider.

Cardinality-bounded: feature × provider × model × outcome ≈ 1.7k series
cap. Deliberately no workspace_name label here — these answer "is this
model effective for this feature", not "is workspace X slow".

LLMCallOutcome enum exported from src.telemetry.prometheus so callers can
reference the canonical values without importing from the metrics module
directly.
Introduces src/telemetry/llm_call_metrics.py — a context-manager-based
wrapper that turns one LLM call into one set of Prometheus samples and
one logfmt log line.

Surface:
- observe_llm_call(...) — context manager yielding a mutable _CallState
  the caller populates over the call's lifetime.
- finalize_success(...) — populate state from a successful response and
  pick the outcome bucket (success / success_after_retry / success_via_backup).
- mark_max_iterations(...) — flip the state to error_max_iterations when
  the tool loop exited via the synthesis path.
- normalize_feature_label(...) — maps caller's track_name/trace_name to
  a low-cardinality Prom label (e.g. "Dreamer/deduction" -> dream_deduction).

No callers wired in yet — this commit is the helper module on its own
so the diff stays reviewable. Wiring into honcho_llm_call and the tool
loop lands in subsequent commits.

Errors raised inside the wrapped call are classified into outcome
buckets (timeout / validation / other) and re-raised; the wrapper never
swallows or transforms exceptions.
Adds prometheus_metrics.record_llm_tool_call() calls in both the
success and error branches of execute_tool_loop's per-tool dispatch.
Threads track_name / trace_name through the function signature so the
emitted metric carries the same feature label that the call-level
metrics will use.

Both new params default to None (current callers don't pass them yet),
so feature label resolves to "unknown" until honcho_llm_call is wired
in the next commit. Metric emission is wrapped in PrometheusMetrics'
sentry-captured error handler — a metric bug can never break a real
tool call.
Wraps the body of honcho_llm_call (both tool-less and tool-loop paths)
in observe_llm_call(...) so every invocation produces one set of
Prometheus samples and one logfmt log line.

Captures the AttemptPlan that produced the most-recent (and on success,
the winning) call via a `last_plan` cell updated inside _get_attempt_plan,
so the recorded provider/model is the one that actually answered —
primary on early attempts, backup on the final retry. This makes
backup-on-final-attempt observable directly from llm_calls / llm_tokens
without parsing logs.

Passes track_name and trace_name through to execute_tool_loop so its
per-tool counter (added in the previous commit) carries the same
feature label as the call-level metrics.

When the tool loop returns response.hit_max_iterations=True, the call's
outcome is overridden to error_max_iterations via mark_max_iterations
so dashboards can split "model didn't converge" from clean success
without the tool-loop having to know about outcome semantics.

Streaming responses don't carry token counts at the entry point —
the recorded call still emits but token counters skip those rows
(record_llm_tokens silently no-ops on count<=0). Acceptable partial
signal until streaming refactor surfaces tokens earlier.

ruff + basedpyright clean. End-to-end smoke verified all six series
fire correctly across success, success_via_backup, error_max_iterations,
error_timeout, and tool-call paths.
…ity-metrics

feat(telemetry): per-LLM-call metrics, structured logs, and tool tracking
@offendingcommit offendingcommit merged commit fe6fb48 into main May 4, 2026
3 of 4 checks passed
@offendingcommit offendingcommit deleted the sync/upstream-2026-05-03 branch May 4, 2026 02:44
offendingcommit added a commit that referenced this pull request May 4, 2026
…architecture

The April-16 gotchas section was stale after the upstream sync (PR #4):

- Legacy 'cf' / 'custom' provider tags removed — replaced by ModelTransport
  literal (anthropic/openai/gemini) and per-component MODEL_CONFIG__* env vars
- 'deriver/summary must stay on cf' rule no longer applies — native gemini
  backend (src/llm/backends/gemini.py) honors response_format=json_schema
- thoughtSignature multi-iteration workaround obsolete — preserved natively
  in src/llm/history_adapters.py + src/llm/executor.py
- LM Studio section: env var names switched from LLM_OPENAI_COMPATIBLE_*
  globals to MODEL_CONFIG__OVERRIDES__BASE_URL / __API_KEY per component

Adds a note that CF Gateway integration is now app-level
(cf-aig-authorization auto-injected by src/llm/registry.py and
src/embedding_client.py based on base_url pattern matching) rather than
deployment-level URL routing.

Ollama Cloud structured-output limitation kept — that's still a real
upstream constraint, just rephrased for the new transport model.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.