docs(designs): background tasks#780
docs(designs): background tasks#780gautamsirdeshmukh wants to merge 1 commit intostrands-agents:mainfrom
Conversation
Documentation Preview ReadyYour documentation preview has been successfully deployed! Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-780/docs/user-guide/quickstart/overview/ Updated at: 2026-04-24T17:04:35.198Z |
|
|
||
| The SDK now provides a built-in mechanism to "fork" an agent (create an independent copy) and run it alongside the original. No manual cloning, no lock conflicts, no coordination overhead. | ||
|
|
||
| **Zero overhead when not used.** Agents that don't configure `backgroundTools` pay no cost. No system prompt augmentation is injected, no management tools are registered, no token or context overhead is added. The decision points check `taskManager.size` and `_backgroundToolNames` and short-circuit immediately — no forks, no queues, no settlement checks. The agent loop behaves identically to today. |
There was a problem hiding this comment.
is that mean you define it as tool not aysnc subagent task
|
|
||
| ### How the Model Sees Background Tasks | ||
|
|
||
| Background tools appear in the model's tool definitions identically to foreground tools — same name, same description, same input schema. There is no schema-level async marker. The model learns which tools run asynchronously and how to interact with them solely from the system prompt augmentation described below. |
There was a problem hiding this comment.
why the system prompt, why not just add it to the tool descriptions?
There was a problem hiding this comment.
See Appendix G -- tested out updating tool spec description as well as a few other alternatives before settling on system prompt augmentation
There was a problem hiding this comment.
we are augmenting tool results for strands-agents/sdk-python#2162 so i think this does have precedent in the sdk.
|
|
||
| ### Modified Agent Loop | ||
|
|
||
|  |
There was a problem hiding this comment.
Can this background tool makes recursive tool call? what will it be for the case single tool is background tool while others are not.
|
|
||
| #### Result Notification | ||
|
|
||
| When the background task completes, its result is injected into the conversation as a user text message — not a `tool_result`, since there is no `tool_use` to pair it with. The `toolUseId` from the original dispatch is echoed for correlation: |
There was a problem hiding this comment.
if I am having a conversation with the agent though, is this a behavior i want?
or if the agent is in the middle of an event loop writing code for example
There was a problem hiding this comment.
alternatively, can we do a strategy where we update the tool description, and say There are 2 async task results available for ..., and let it be a bit more model driven instead of default notification?
how do the possibly different contexts interact with each other?
| | Point | Location | Blocking | Purpose | | ||
| |-------|----------|----------|---------| | ||
| | **A** | Start of each loop cycle | No | Pop any background tasks that finished (success, error, or cancelled) since the last cycle and inject their results into the conversation as user messages. Proceeds immediately if none have settled. | | ||
| | **B** | Per tool, during dispatch | No | For each tool the model calls, check if it's designated as a background tool. If yes: fork the agent (or queue if `maxConcurrentBackgroundTasks` is reached), dispatch the tool on the fork, and return an immediate ACK to the model. If no: execute the tool inline as normal. The agent continues without waiting for background results. | |
There was a problem hiding this comment.
You mention forking the agent but dispatching the tool on the fork. Do we need to fork the entire agent? Also, do we need to use forks at all? What if tools were responsible for returning an ack if they dispatch a background process. I think in most scenarios it is going to be the case that a tool calls an api that is running a process on a separate server and thus the fork wouldn't be necessary.
|
|
||
| The model can dispatch multiple tools simultaneously, and they all begin executing immediately. As each result arrives, the model can react and adjust its strategy in real time, triggering follow-up work as needed or cancelling tasks that are no longer required. No predefined topology is required – the model's dispatch strategy emerges from its own reasoning. | ||
|
|
||
| ### A single agent instance can now run concurrent work |
There was a problem hiding this comment.
Does it work for agent as tools too?
|
|
||
| The SDK now provides a built-in mechanism to "fork" an agent (create an independent copy) and run it alongside the original. No manual cloning, no lock conflicts, no coordination overhead. | ||
|
|
||
| **Zero overhead when not used.** Agents that don't configure `backgroundTools` pay no cost. No system prompt augmentation is injected, no management tools are registered, no token or context overhead is added. The decision points check `taskManager.size` and `_backgroundToolNames` and short-circuit immediately — no forks, no queues, no settlement checks. The agent loop behaves identically to today. |
There was a problem hiding this comment.
Can you add backgroundTools to the definitions?
| [Background Task Result] | ||
| tool: <tool_name> | ||
| toolUseId: <tool_use_id> | ||
| status: success|error|cancelled |
There was a problem hiding this comment.
this can't be interrupted right?
|
|
||
| Background tools appear in the model's tool definitions identically to foreground tools — same name, same description, same input schema. There is no schema-level async marker. The model learns which tools run asynchronously and how to interact with them solely from the system prompt augmentation described below. | ||
|
|
||
| When any background tools are passed to the agent, the SDK auto-generates and appends the following block to the system prompt: |
There was a problem hiding this comment.
So background tools need to be configured by the user? Why can't the agent make the decision to run a tool in the background or not?
There was a problem hiding this comment.
plus one -- autonomy feels like the main/only gain of background tasks to me, especially given that is was significantly slower than graph in the examples below
|
|
||
| Two alternative approaches to giving the model async dispatch capability: | ||
|
|
||
| **1. Meta-tool: `run_in_background(tool_name, args)`** |
There was a problem hiding this comment.
I'm more for this tool wrapper -- it enables the model driven approach that is the backbone of strands.
| ```typescript | ||
| type TaskStatus = 'queued' | 'inProgress' | 'success' | 'error' | 'cancelled' | ||
|
|
||
| class BackgroundTask implements PromiseLike<unknown> { |
There was a problem hiding this comment.
how does this correlate to MCP's task definition?
|
|
||
| **Prior art: Mastra's dynamic dispatch.** Mastra solves the "same tool, both modes" problem by allowing the model to include a `_background` field in tool call args to override background/foreground per-call. This is a valid approach that adds flexibility, but it adds a hidden parameter to every tool's input schema, requires the model to learn when to use it, and means the developer can't guarantee a tool will always run in a specific mode. We chose static assignment for v1 because it's simpler, predictable, and lets the developer reason about fork safety at construction time. Dynamic dispatch via an opt-in allowlist (developer pre-approves which tools can be dynamically backgrounded, model decides per-call) is the natural extension path if static assignment proves too restrictive. | ||
|
|
||
| **3. Task management tool: `manage_tasks({ action: "create" | "status" | "stop" | "get_result", ... })`** |
There was a problem hiding this comment.
this seems vaguely coupled to strands-agents/tools#389, which is the context management side of self-managed tools (whereas this is the lifecycle side). i would say they are very intertwined though and worth scoping as we look more into meta tools for context
|
|
||
| Task state is tracked via an internal discriminated union rather than promise state, because the agent loop must check settlement without blocking — a raw Promise offers no synchronous status inspection. The union carries the associated data for each state (result value, error, cancellation reason), eliminating the need for separate flags. When `cancel()` is called, status transitions to `'cancelled'` immediately regardless of current state (queued or inProgress). No-op if the task has already settled. See [Cancellation](#cancellation) for the full API. | ||
|
|
||
| #### TaskManager |
There was a problem hiding this comment.
Nomenclature nit: should this be BackgroundTaskManager if it manages BackgroundTasks?
|
|
||
| **Zero overhead when not used.** Agents that don't configure `backgroundTools` pay no cost. No system prompt augmentation is injected, no management tools are registered, no token or context overhead is added. The decision points check `taskManager.size` and `_backgroundToolNames` and short-circuit immediately — no forks, no queues, no settlement checks. The agent loop behaves identically to today. | ||
|
|
||
| ## How It Works |
There was a problem hiding this comment.
For this to work, do we need to keep the process/runtime open?
|
|
||
| #### Result Notification | ||
|
|
||
| When the background task completes, its result is injected into the conversation as a user text message — not a `tool_result`, since there is no `tool_use` to pair it with. The `toolUseId` from the original dispatch is echoed for correlation: |
There was a problem hiding this comment.
will this bring side effects across different models?
| | Sonnet 4.6 | 5 | 86.5s | 31.4s | **2.80x** | ±0.40 | 4.4% | 110/110 | | ||
| | Haiku 4.5 | 5 | 79.6s | 28.2s | **2.89x** | ±0.40 | 18.2% | 110/110 | | ||
|
|
||
| Context growth (avg messages): 6-7 standard vs 8-10 background. The additional messages are injected background task results — expected behavior, not overhead. Input tokens increase 11-40% due to the model seeing injected results across multiple turns rather than in a single batch. See [Context Management](#context-management). |
There was a problem hiding this comment.
a 40% increase in tokens is highly noteworthy. that is a huge cost
There was a problem hiding this comment.
Ah, good callout here - this particular figure is from earlier benchmarks, more recent testing after tightening up various aspects of the mechanism showed far less bloat. Will update this
| tools: [calculateMetrics, formatReport], | ||
| backgroundTools: [searchWeb, analyzeData, researcher], |
There was a problem hiding this comment.
Is there a different way we can define this interface that makes it clear that tools is the superset of tools and backgroundTools?
There was a problem hiding this comment.
plus one, if the same tool is included in tools and background tools, what does that mean??
|
|
||
| This is the structural asymmetry at the heart of background tasks: the ACK uses native `tool_result` pairing, but the real result arrives as a plain text message. This is why the system prompt augmentation is necessary — see [Appendix G](#appendix-g-system-prompt-augmentation-rationale). | ||
|
|
||
| #### Model-Driven Task Management |
There was a problem hiding this comment.
I worry this could lead to more token usage. Does the model need to be the one to poll background tasks? Is that something we can do in the agent loop? Or is it something users could do.
There was a problem hiding this comment.
I'm thinking of the following scenario:
- Model emits a tool use.
- Tool runs in background.
- No other tool is executing and so no other work is being done in the loop.
- We exit out of the agent loop with a handler so the user can decide when to reinvoke.
- One advantage here is that if no work is being done locally, the user can save on compute and shut down their entire process.
- Once ready, the user reinvokes the agent.
- Agent returns back to the tool call to retrieve the result.
- The result is sent to the model.
- The model isn't aware that the tool was executed in the background.
Your approach however allows for the model to continue processing concurrent results and follow up on background results. I can see then why passing those as a plain user message rather than a tool result could be helpful. Still curious though if there is a way to still enable passing back as a tool result.
| - **`enqueue()`** — the entry point for background dispatch. Creates a `BackgroundTask`, either starts it immediately or queues it based on concurrency, and returns the task handle with the appropriate ACK. | ||
| - **`popCompleted()`** — returns and removes all settled tasks from the registry. Triggers queue drain if slots opened. This is the only way tasks leave the registry, ensuring no task is accidentally lost or processed twice. | ||
| - **`cancel(id)`** — cancel a specific task by its internal ID. Works on both queued and running tasks. Cancelling a queued task removes it from the queue without ever forking. | ||
| - **`cancelByToolUseId(toolUseId)`** — cancel a specific task by its `toolUseId`. This is the primary path for model-driven cancellation, since the model knows `toolUseId` from its own `tool_use` blocks in conversation history. |
There was a problem hiding this comment.
should this be a task manager method though? i think it should be part of the tool, and it should just translate to cancel(id)
| | **Execution** (invocation lock, cancellation, metrics) | Fresh. Each fork can be invoked and cancelled independently. | | ||
| | **Task management** | Fresh `TaskManager`, same config. Fork manages its own background tasks. | | ||
|
|
||
| Messages are deep-copied by default. For background tool forks specifically, the SDK automatically passes `messages: []` at Decision Point B — see [Context Management](#context-management) for why and how to opt out via `inheritMessages`. |
There was a problem hiding this comment.
This confuses me. They are copied by default, but not by default for background tool forks? Can you clarify here what the difference scenarios are?
|
|
||
| #### backgroundTools Config | ||
|
|
||
| `backgroundTools` accepts the same types as `tools` — `Tool`, `McpClient`, `Agent`, `Graph`, `Swarm`, or nested arrays. Anything that can be a tool can be a background tool. |
There was a problem hiding this comment.
isn't this more of a tool concern though? i'd expect more like @tool(background=True) or something 🤔
There was a problem hiding this comment.
i like this alot better as well
There was a problem hiding this comment.
ideally we'd default background to false and allow agents to run tasks in background unless explicitly configured otherwise, i'd imagine?
| | **Fork** | An independent copy of the agent created via `fork()`. Has its own conversation, execution lock, and task manager, but shares the parent's model client and tool registry. The isolation primitive that makes concurrent execution safe. | | ||
| | **Decision Point** | One of three locations in the modified agent loop where background task logic is injected. **A** (top of cycle): pop settled results. **B** (per tool): fork and dispatch or execute inline. **C** (end of turn): wait if tasks are pending. | | ||
| | **ACK** | The immediate `tool_result` returned to the model when a background tool is dispatched. Contains "Background task dispatched" or "Background task queued." Not a real result — the actual output arrives later via injection. | | ||
| | **Injection** | The mechanism by which background task results enter the parent's conversation. Results are appended as user text messages with a `[Background Task Result]` prefix and `toolUseId` for correlation. | |
There was a problem hiding this comment.
Will the model hallucinate the "late" tool results?
|
|
||
| #### fork() | ||
|
|
||
| `fork()` creates an independent copy of the agent that can be invoked concurrently with the original. It is both the isolation primitive that makes background tasks possible (each dispatch at Decision Point B creates a fork) and a standalone capability for developers who want to parallelize work using the same agent configuration. |
There was a problem hiding this comment.
Might need more justification here?
|
|
||
| ##### Fork depth guard | ||
|
|
||
| A configurable depth limit (default: 20, set via `maxForkDepth` on `AgentConfig`) prevents infinite recursive forking — for example, a background tool that itself dispatches background tools. Throws if exceeded. |
There was a problem hiding this comment.
Is 20 not a bit excessive here? With 10 forks, all 20 forks deep, that's 200 agents, if the fork is only allowed to spin up 1 fork (can't find the default here). How do we keep cost managed?
There was a problem hiding this comment.
How do we ensure multiple forks aren't converging to all doing the same task accidentally?
|
|
||
| #### TaskManager | ||
|
|
||
| `TaskManager` is the lifecycle manager for `BackgroundTask` instances created during the `backgroundTools` dispatch path ([Decision Point B](#three-decision-points)). It owns settlement detection, cancellation, and cleanup. Each agent instance holds its own `TaskManager`; forks get fresh instances with the same config. This isolation ensures a fork's background tasks are the fork's responsibility — the parent only sees the fork itself as one task, never the sub-tasks the fork may spawn internally. |
There was a problem hiding this comment.
Do we have to manage our own task manager? This seems heavy and task management has already been solved.
|
|
||
| #### Events | ||
|
|
||
| Three new hookable events: |
There was a problem hiding this comment.
who hooks into these? what are the use cases?
| ..." | ||
| ``` | ||
|
|
||
| The model must not assume results arrive in dispatch order. `toolUseId` correlates each result to its original `tool_use` block. |
There was a problem hiding this comment.
We will implement this logic right? instead of letting the model find and map by itself.
|
|
||
| `fork()` creates an independent copy of the agent that can be invoked concurrently with the original. It is both the isolation primitive that makes background tasks possible (each dispatch at Decision Point B creates a fork) and a standalone capability for developers who want to parallelize work using the same agent configuration. | ||
|
|
||
| Why it's needed: `invoke()` on the same agent instance acquires an `_isInvoking` lock. A second concurrent call throws `ConcurrentInvocationError`. This is a deliberate safety rail — concurrent writes to the same `messages` array would corrupt conversation state. `fork()` gives each concurrent invocation its own messages, state, and lock: |
There was a problem hiding this comment.
why are we invoking the same agent though? why not just use use_agent like sub-agent?
| |----------|-----------|-------------|---------------| | ||
| | Standard | 98.1s | baseline | 4,006 chars | | ||
| | Background | 66.8s | **1.47x faster** | 3,371 chars | | ||
| | Graph | 34.8s | **2.82x faster** | 3,866 chars | |
There was a problem hiding this comment.
graph is alot faster, i feel like swarm would be as well and would address the same usecase. why could we not add async support to swarm?
| } | ||
| ``` | ||
|
|
||
| Cancellation follows the same pattern as `fetch` with an aborted `AbortSignal` — intentional cancellation is a rejection, not a silent resolve. The synchronous getters (`status`, `result`, `error`) enable inspection without awaiting. |
There was a problem hiding this comment.
This means that we need to thread through the signal into the tool. If possible I'd avoid handling more than one abort signal in addition to the existing top level agent signal
|
|
||
| ```typescript | ||
| interface TaskManagerConfig { | ||
| heartbeatMs?: number // How often to emit BackgroundTaskPendingEvent while waiting (default: 5000ms) |
There was a problem hiding this comment.
maybe timeBetweenPendingEventMs or something else closer to the function
|
|
||
|  | ||
|
|
||
| ### Modified Agent Loop |
There was a problem hiding this comment.
High level from the review meeting: I would be interested to see if we could first develop some constructs around async non-blocking tools/work/agents without deeply integrating the feature into the core event loop.
Proving out this async lifecycle management (very likely re-using similar approaches presented in the doc) in a more additive/compatible way could help to iron out the topic and prove out use cases.
|
|
||
| In both cases, the agent blocks until the next background task settles, injects the result into the conversation, and re-enters the loop. See [Cancellation](#cancellation) for the safety bounds that prevent indefinite waits. | ||
|
|
||
| ### How the Model Sees Background Tasks |
There was a problem hiding this comment.
One quick call out, if we return a tool result later to the model, it may not know what to do with it because context could be lost from other messages added while waiting. Or it could be the case the conversation history was summarized while waiting.
Description
Related Issues
Type of Change
Checklist
npm run devBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.