extension/llm/server: isolated multi-session serving (V2a) by mergennachin · Pull Request #20159 · pytorch/executorch

mergennachin · 2026-06-09T16:23:15Z

A worker now hosts one LLMEngine (weights loaded once) and serves multiple
isolated sessions keyed by session_id, each with its own KV/recurrent state, up
to the engine's serving capacity -- so one ~18GB model load backs many
independent conversations instead of one. Execution stays synchronous (one
in-flight request; the control plane serializes): this is isolation, not
concurrent streaming.

The shared worker loop (worker_loop.h) owns the sessions. Named sessions are
created on first use (or an open op) and capped at capacity-1; one slot is
reserved for a scratch session that serves anonymous, session-less requests.
Over-capacity or single-session backends return structured errors
(capacity_exhausted / unsupported_session); there is no eviction/TTL in the MVP,
so capacity_exhausted stands when named sessions exceed worker capacity. Both
workers (text_llm_worker, qwen3_5_moe_worker) now pass their LLMEngine to the
loop.

The control plane (over the SessionRuntime boundary introduced earlier) routes a
request to a session via the session_id body field or, as header aliases,
X-ExecuTorch-Session-ID / session_id / x-session-affinity (body wins, then that
header order). The aliases let a client that already emits a stable
per-conversation affinity id (e.g. pi's sendSessionAffinityHeaders) route with no
client-specific server code. The session is admitted up front (so a capacity
refusal is HTTP 429/400 before any stream bytes), and DELETE /v1/sessions/{id}
frees one.

Review order: worker_loop.h (session ownership + protocol), then the two
workers; then the control plane (protocol.py, errors.py, serving_chat.py,
server.py, serve.py); then tests.

Part of #20001

[ghstack-poisoned]

mergennachin · 2026-06-09T16:23:16Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2026-06-09T16:23:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20159

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 4 Unrelated Failures, 2 Unclassified Failures

As of commit 8d71e8e with merge base eeb0646 ():

NEW FAILURES - The following jobs have failed:

pull / test-qnn-passes-linux / linux-job (gh)
RuntimeError: Command docker exec -t 59453d1e841000ba721f6cf9c1b5043392880c8c48810a16088774ccae4b1e79 /exec failed with exit code 92
pull / unittest / linux / linux-job (gh)
RuntimeError: Command docker exec -t b03debd52e02a709427089af1af534587fcea825fe47fed1293711c0762b4ccd /exec failed with exit code 1
pull / unittest / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / unittest-editable / linux / linux-job (gh)
RuntimeError: Command docker exec -t 15016dc18fd45a629150fa74683ae78fd5d076c45dfd9cc5a412d9428cf4add9 /exec failed with exit code 1
pull / unittest-editable / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

Build Windows Wheels / pytorch/executorch / build-wheel-py3_10-cpu (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Process completed with exit code 1.
Build Windows Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_x64

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

MLX / test-mlx-voxtral-realtime / test-mlx-voxtral-realtime (gh) (detected as infra flaky with no log or failing log classifier)
pull / test-models-linux (mobilebert, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
pull / test-models-linux-basic (vit, xnnpack-quantization-delegation, buck2, linux.2xlarge, executorch-u... / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
pull / test-qnn-buck-build-linux / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

[INITIAL] Update

3269df9

[ghstack-poisoned]

mergennachin requested review from kirklandsign and larryliu0820 as code owners June 9, 2026 16:23

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 9, 2026

[UPDATE] Update

8d71e8e

[ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extension/llm/server: isolated multi-session serving (V2a)#20159

extension/llm/server: isolated multi-session serving (V2a)#20159
mergennachin wants to merge 2 commits into
gh/mergennachin/8/headfrom
gh/mergennachin/9/head

mergennachin commented Jun 9, 2026

Uh oh!

mergennachin commented Jun 9, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mergennachin commented Jun 9, 2026

Uh oh!

mergennachin commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20159

❌ 5 New Failures, 4 Unrelated Failures, 2 Unclassified Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mergennachin commented Jun 9, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 9, 2026 •

edited

Loading