feat(agents): add prompt-compaction middleware for McpClient by Mgczacki · Pull Request #2055 · dimensionalOS/dimos

Mgczacki · 2026-05-12T09:06:44Z

Summary

Caps the prompt the dimos agent sends to its LLM so the conversation history
never grows unbounded. Implemented as a langchain AgentMiddleware plugged into
create_agent(middleware=...). Because the hook (before_model) fires before
every model invocation, the input-size bound becomes an invariant of the agent
loop — including intra-turn re-invocations (model → tool → tool result → model).

On long sessions the middleware quietly summarizes older turns once it detects
an oversized prompt. Behavior is unchanged for short sessions.

Concepts

`dimos_turn`

A new integer tag attached to each message's additional_kwargs dict.
Incremented once per McpClient._process_message call — that is, once per
user-facing turn (a human input from agent-send, or a tool-stream
notification that wakes the agent). Every message that flows through during
that turn — the input HumanMessage, intermediate AIMessages with
tool_calls, the resulting ToolMessages, the final AIMessage — all get
stamped with the same turn number.

This is what lets compaction:

Group messages by turn so tool_call/tool_response pairs always travel
together (compaction selects entire turns, never partial ones — no orphan
tool_call_id references).
Identify the current turn (the latest tag value plus any trailing
untagged in-flight messages from the agent loop) and preserve it untouched
regardless of threshold.
Score / inspect the history per-turn for future heuristics (e.g.,
keep-N-most-recent strategies).

dimos_turn is metadata only — it lives in additional_kwargs, which
providers ignore but langchain serialization preserves. The compaction
summary itself is tagged with the max turn it covers (plus
dimos_compacted: True), so re-compaction folds the prior summary into the
next one cleanly.

Current turn is sacred

_current_turn_start walks from the end of the message list to find the
boundary of the latest turn. Everything from that boundary forward is never
compacted — no image strip, no summary touch. This protects:

The user's current query
In-progress tool calls and their pending ToolMessage responses
Fresh images from perception that the user might be asking about right now

How it works

Two-stage compaction inside before_model:

Strip images in messages older than the current turn. Image content
blocks are replaced with a small text placeholder. If this alone gets us
below target_tokens, we stop here.

Caveat: this is an incomplete solution. Dropping the image with only
a "[image removed]" placeholder is destructive as the model can no
longer refer back to that perception. A more principled design would
follow progressive disclosure: keep the image addressable in a content
store and replace the inline block with a reference (e.g.,
[image: ref://…]) plus a tool the agent can call to re-fetch it on
demand. I am deferring this decision as it needs a broader agent-harness
conversation about content addressability.

As to why I decided to strip images: LLM's visual reasoning capabilities are
currently noticeably worse than with text. Additionally, the way in which the
agent loop is set up right now makes it so the model gets to see the image at the
beginning of a new turn, and it tends to give a description of what's in the image.
This description is detailed enough for reasoning about the content of the image,
but it also causes a secondary effect: The model, when considering the image, will
default to anchor its perception (even if the image is available in chat history) to the
comment it gave at the moment. Keeping images that were already observed therefore
seems like a waste of tokens that we can save since we are already going to cause a
cache burst with our compaction process.

Summarize older messages into a single SystemMessage while keeping
the most recent turns verbatim. The summarizer LLM is configurable;
defaults to reusing the agent's own model. Output is hard-capped via
summarizer.bind(max_tokens=summary_size_tokens).

See it in action

A public Langfuse trace captured with deliberately small defaults so
compaction fires after a handful of turns:

https://us.cloud.langfuse.com/project/cmp23t80n09ooad08jnw1lksy/traces/887630cfbf49bb97f1c5b4d2cc980ad1?observation=b73fcf77cb4f2dc5&timestamp=2026-05-12T07:54:34.311Z

Use the trace timeline to see the prompt that hits the LLM at each
agent-turn-N span — older turns get folded into a single summary
SystemMessage and the agent continues with a shrunk prompt.

Configuration

All on by default via McpClientConfig, env-driven:

Env var	Field	Default
`AGENT_COMPACTION_THRESHOLD`	`agent_compaction_threshold`	`40000`
`AGENT_COMPACTION_TARGET`	`agent_compaction_target`	`3000`
`AGENT_COMPACTION_SUMMARY_SIZE`	`agent_compaction_summary_size`	`1000`
`AGENT_COMPACTION_MODEL`	`agent_compaction_model`	`None` (reuses agent's model)

Why a middleware

Two reasons, both documented in compaction_middleware.py's module docstring:

Middleware vs preprocessing. External preprocessing on _history
would only fire once per user turn, leaving every intra-turn re-invocation
unprotected. Middleware fires before each model call.
before_model vs after_model / wrap_model_call. before_model is
the minimal-intervention hook. after_model is too late (the model
already errored on overflow); wrap_model_call conflates compaction with
the model-call concerns (retries, error shaping, tool dispatch).

Changes

New files

dimos/agents/compaction_middleware.py — DimosCompactionMiddleware
class (subclass of langchain.agents.middleware.AgentMiddleware),
placeholder token counter (3 chars/token, 1000 tokens/image; memoized in
additional_kwargs[\"dimos_tokens\"] for O(new-only) recompute), static
token cache for system_prompt + tool schemas, and the algorithm helpers
(_strip_images, _split, _current_turn_start, _summarize).
dimos/agents/test_compaction_middleware.py — 15 pytest cases,
hermetic (no API key needed). Coverage includes:
- Token counter unit tests (text, image, memoization, static cache)
- before_model no-op below threshold
- Stage 1 alone suffices (image strip only)
- Stage 2 summarization with FakeListChatModel summarizer
- Protected SystemMessage prefix preserved
- Mid-list untagged messages get summarized (not protected)
- Prior summary re-folded into the next summary (no stacking)
- Most-recent turns kept verbatim
- Tool-call/tool-response pairs never split across summarize/keep boundary
- Summarizer failure propagates after retries
- Two integration tests that drive a real create_agent loop with a
  RecordingFakeAgent and assert: (a) the agent node receives a compacted
  prompt (proves langgraph's add_messages reducer interprets the
  RemoveMessage(REMOVE_ALL_MESSAGES) sentinel correctly), and
  (b) compaction can fire mid-turn between a tool result and the next
  model call.

Modified: `dimos/agents/mcp/mcp_client.py`

Config: four new fields on McpClientConfig reading the env vars in
the table above. _env_int / _env_str helpers loaded via pydantic
Field(default_factory=...).
Turn tagging: new _turn: int counter on McpClient (incremented at
the top of _process_message), and a new module-level _tag_turn(message, turn) helper that stamps additional_kwargs[\"dimos_turn\"]. Every
message flowing through a turn gets stamped — the incoming HumanMessage
first, then every message emitted by the state graph.
History sync: new _apply_messages_update method that mirrors
langgraph's add_messages reducer semantics locally — honors
RemoveMessage(id=REMOVE_ALL_MESSAGES) as "wipe history, use what
follows" and specific-id RemoveMessage as targeted removal. This keeps
McpClient._history in sync with the graph's internal state even when the
middleware replaces the entire message list.
Middleware wiring: in on_system_modules, construct the summarizer
(either via init_chat_model(agent_compaction_model), or
init_chat_model(model) if the agent's model is a string, or reuse the
agent's model object), build the middleware with the system prompt and
tool JSON schemas (t.args_schema.model_json_schema()), and pass it as
create_agent(..., middleware=middleware).
Robustness in the stream loop: the worker thread now guards against
middleware no-op updates that yield {node: None} instead of {node: {\"messages\": [...]}}, which would previously crash with 'NoneType' object has no attribute 'get'.

Modified: `.gitignore`

Adds MUJOCO_LOG.TXT (MuJoCo runtime artifact written to the repo root on
every sim run; should never be committed).

Test plan

uv run pytest dimos/agents/test_compaction_middleware.py -v — 15/15
pass.
uv run mypy dimos/agents/compaction_middleware.py dimos/agents/test_compaction_middleware.py dimos/agents/mcp/mcp_client.py — clean.
Live verification: dimos --simulation run unitree-go2-agentic with
AGENT_COMPACTION_THRESHOLD=2000, drive the agent until the threshold
is crossed, confirm a Compaction fired (summarize) log line appears
and the next prompt sent to the LLM contains the summary
SystemMessage instead of the older turns.

Known limitations

Documented in the module docstring as "Known limitations":

Image stripping is destructive — see caveat under stage 1 above.
Progressive disclosure with a content store is the right long-term answer.
Summarizer transcript size is unbounded — a first-ever compaction on a
very long session could exceed the summarizer model's own context window.
Mitigation deferred to a follow-up (chunked summarization).
@retry(on_exception=Exception) is intentionally broad because the
summarizer is duck-typed; permanent errors cost up to 3 attempts + 1s of
sleeps before propagating.

Caps the prompt the agent sends to its LLM so the conversation history never grows unbounded. Runs as a langchain AgentMiddleware via create_agent(middleware=...), so the size bound becomes an invariant of the agent loop — `before_model` fires before every model call, including intra-turn re-invocations (model -> tool -> tool result -> model). Two-stage compaction: 1. Strip image content blocks from older messages (replace with a small text placeholder). 2. If still over target, summarize older messages into a single SystemMessage and keep the most recent turns verbatim. The current turn (latest dimos_turn group + any trailing untagged messages, i.e. in-flight tool calls) is preserved untouched — never compacted, never image-stripped. Configuration via McpClientConfig fields, env-driven by default: AGENT_COMPACTION_THRESHOLD trigger size (default 40000) AGENT_COMPACTION_TARGET size after compaction (default 3000) AGENT_COMPACTION_SUMMARY_SIZE generated summary size (default 1000) AGENT_COMPACTION_MODEL optional separate summarizer model Also includes: - Per-message turn tagging via additional_kwargs["dimos_turn"], stamped in McpClient._process_message so compaction can group/score by turn. - McpClient._history mirror updated to honor langgraph's add_messages reducer semantics (RemoveMessage(id=REMOVE_ALL_MESSAGES) sentinel) so the local history doesn't accrete pre-compaction state. - Token counter is a pessimistic placeholder (3 chars/token, 1000/image), memoized on each message for O(new-only) recompute cost. Designed to be swapped for a real tokenizer later without touching callers. - 15 pytest cases (hermetic, no API key needed), including two integration tests that drive a real create_agent loop and prove compaction can fire mid-turn between a tool result and the next model call. Defaults are intentionally conservative so the feature is on by default without changing behavior for short sessions.

greptile-apps · 2026-05-12T09:11:31Z

Greptile Summary

This PR adds a DimosCompactionMiddleware that caps the agent's prompt size before every LLM call, preventing unbounded history growth. It is wired into McpClient via create_agent(middleware=...) and operates in two stages: strip images from older turns, then summarize those turns into a single SystemMessage if still over budget.

New compaction_middleware.py: DimosCompactionMiddleware with memoized token counting, two-stage compaction, dimos_turn-aware boundary alignment to keep tool-call/tool-response pairs coherent, and 15 hermetic pytest cases covering unit behaviour, integration with a real agent graph, re-compaction folding, and failure propagation.
mcp_client.py updates: Four new env-driven McpClientConfig fields, a _turn counter that stamps every message with a dimos_turn tag, a new _apply_messages_update method that mirrors LangGraph's add_messages reducer locally and suppresses duplicate publishes after a compaction wipe, and a guard in the stream loop for middleware no-op updates.
.gitignore: Adds MUJOCO_LOG.TXT to exclude the MuJoCo runtime artifact.

Confidence Score: 5/5

Safe to merge; the compaction logic is well-reasoned, hermetically tested, and defaults are conservative enough to opt out if something unexpected arises in production.

The core algorithm is correct and well-tested across 15 unit and integration cases. The _apply_messages_update history-sync logic handles the REMOVE_ALL_MESSAGES sentinel correctly and suppresses duplicate publishes. The only findings are an edge-case gap in untagged-message boundary alignment (normal agent flow is unaffected since all messages are tagged), an unused helper method, and a silent or-fallback for zero-value env vars. None of these affect the happy path.

No files require special attention; all findings are confined to edge cases and dead code.

Important Files Changed

Filename	Overview
dimos/agents/compaction_middleware.py	New middleware implementing two-stage prompt compaction (image strip to summarize); algorithm is well-designed and thoroughly tested, with minor dead code and an edge-case gap in turn-boundary alignment for untagged messages.
dimos/agents/mcp/mcp_client.py	Adds turn tagging, compaction middleware wiring, and _apply_messages_update for history sync; the or-fallback pattern in Field default factories silently swallows an explicit 0 env var value for the three integer config fields.
dimos/agents/test_compaction_middleware.py	Comprehensive hermetic test suite covering unit behaviour, integration with a real create_agent graph, tool-call coherence, re-compaction folding, and failure propagation; no issues found.
.gitignore	Adds MUJOCO_LOG.TXT to prevent the MuJoCo runtime log from being committed; trivial, correct change.

Sequence Diagram

sequenceDiagram
    participant U as User
    participant MC as McpClient
    participant G as LangGraph agent
    participant MW as DimosCompactionMiddleware
    participant LLM as Chat model
    participant H as _history

    U->>MC: HumanMessage
    MC->>MC: increment turn, tag message
    MC->>H: append and publish
    MC->>G: stream(history)

    loop each model call in the agent loop
        G->>MW: before_model(state)
        alt "total tokens <= threshold"
            MW-->>G: None (no-op)
        else stage 1 image strip suffices
            MW-->>G: RemoveMessage + stripped + current_turn
        else stage 2 summarize
            MW->>LLM: invoke transcript
            LLM-->>MW: summary text
            MW-->>G: RemoveMessage + protected + SummaryMsg + keep + current_turn
        end
        G->>LLM: invoke compacted messages
        LLM-->>G: AIMessage
        G-->>MC: stream update
        MC->>MC: _apply_messages_update
        MC->>H: rebuild history, publish new messages only
    end

_{Reviews (2): Last reviewed commit: "fix(compaction): address Greptile review..." | Re-trigger Greptile}

greptile-apps · 2026-05-12T09:11:36Z

+def _env_int(name: str) -> int | None:
+    v = os.environ.get(name)
+    return int(v) if v else None


_env_int calls int(v) without a try/except, so a non-numeric value like AGENT_COMPACTION_THRESHOLD=abc raises a bare ValueError deep inside pydantic's default_factory during config construction, producing an unhelpful traceback with no mention of which env var is at fault.

Suggested change

def _env_int(name: str) -> int | None:

v = os.environ.get(name)

return int(v) if v else None

def _env_int(name: str) -> int | None:

v = os.environ.get(name)

if not v:

return None

try:

return int(v)

except ValueError:

raise ValueError(f"Environment variable {name!r} must be an integer, got {v!r}") from None

- McpClient._apply_messages_update: dedupe publish on compaction replay. When the middleware emits [RemoveMessage, protected..., summary, keep..., current_turn...], the protected/keep/current messages are the same Python objects that were already published when they first arrived. Skip publish+print for any iter_msg whose id() was in the pre-wipe history; only the genuinely-new summary (and later AIMessages from the agent node in subsequent stream updates) get republished. Identified by Greptile P1. - McpClient._env_int: re-raise a labeled ValueError when the env var value isn't a valid integer, so misconfiguration surfaces with the offending name instead of a bare pydantic traceback. Identified by Greptile P2. - DimosCompactionMiddleware._static_tokens: drop the per-call hash computation. Inputs (system_prompt, tool_schemas) are bound at __init__ and never mutate, so a simple None-check on the cache is sufficient. Identified by Greptile P2.

greptile-apps Bot reviewed May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agents): add prompt-compaction middleware for McpClient#2055

feat(agents): add prompt-compaction middleware for McpClient#2055
Mgczacki wants to merge 2 commits into
dimensionalOS:mainfrom
Mgczacki:feat/agent-prompt-compaction

Mgczacki commented May 12, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

greptile-apps Bot May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mgczacki commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Concepts

dimos_turn

Current turn is sacred

How it works

See it in action

Configuration

Why a middleware

Changes

New files

Modified: dimos/agents/mcp/mcp_client.py

Modified: .gitignore

Test plan

Known limitations

Uh oh!

greptile-apps Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

greptile-apps Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Mgczacki commented May 12, 2026 •

edited

Loading

`dimos_turn`

Modified: `dimos/agents/mcp/mcp_client.py`

Modified: `.gitignore`

greptile-apps Bot commented May 12, 2026 •

edited

Loading