Skip to content

feat(model): enable Anthropic prompt caching by default#404

Merged
OisinKyne merged 2 commits intomainfrom
claude/competitive-analysis-franklin-NV0xw
May 1, 2026
Merged

feat(model): enable Anthropic prompt caching by default#404
OisinKyne merged 2 commits intomainfrom
claude/competitive-analysis-franklin-NV0xw

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented May 1, 2026

Summary

  • Attach cache_control_injection_points: [{location: message, role: system}] to every Anthropic model_list entry LiteLLM emits — both the anthropic/* wildcard and explicit per-model entries.
  • LiteLLM auto-injects cache_control: {type: ephemeral} on the system message of every request to those entries, the canonical "prompt caching by default" pattern.
  • Non-anthropic providers (openai, ollama, custom) are intentionally untouched — cache_control is Anthropic-specific and the paid/* route proxies to arbitrary upstreams via the buyer sidecar.

Why

Token cost on Anthropic models is dominated by repeated system / tool / RAG prefixes. Enabling prompt caching by default cuts that cost without any per-call changes on the agent or skill side. Closes a parity gap with Franklin (which advertises Anthropic prompt caching enabled by default).

Test plan

  • go build ./...
  • go test ./internal/model/
  • New subtests in TestBuildModelEntries:
    • anthropic_entries_inject_system-message_cache_breakpoint — both wildcard and explicit anthropic entries carry exactly one {Location: "message", Role: "system"} injection point.
    • non-anthropic_entries_do_not_inject_cache_control — openai and ollama entries have empty CacheControlInjectionPoints.
  • Manual cluster check: obol model setup anthropic then kubectl get configmap litellm-config -n llm -o jsonpath='{.data.config\.yaml}' — entries for anthropic/* and any explicit claude models should show cache_control_injection_points.

https://claude.ai/code/session_012FTDF8ofWWCLwU8GEN2SFh


Generated by Claude Code

Add cache_control_injection_points to every Anthropic model_list entry
LiteLLM emits (both the anthropic/* wildcard and explicit per-model
entries). Pinning the system message as the cache breakpoint makes
LiteLLM auto-attach cache_control: {type: ephemeral} on every request
to an Anthropic upstream, the canonical "prompt caching by default"
pattern. Non-anthropic providers (openai, ollama, custom) are
unaffected.

https://claude.ai/code/session_012FTDF8ofWWCLwU8GEN2SFh
@OisinKyne OisinKyne enabled auto-merge (rebase) May 1, 2026 11:33
@OisinKyne OisinKyne merged commit 2bcc1d1 into main May 1, 2026
5 checks passed
@OisinKyne OisinKyne deleted the claude/competitive-analysis-franklin-NV0xw branch May 1, 2026 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants