examples/models/gemma4_31b: CUDA Engine/Session adapter + OpenAI serving by mergennachin · Pull Request #20207 · pytorch/executorch

mergennachin · 2026-06-10T21:38:31Z

Adds the Gemma 4 31B serving path, mirroring qwen3_5_moe: a CUDA
Engine/Session adapter (chunked prefill, per-session mutable rebinding,
in-graph sampling) behind the model-agnostic LLMEngine/LLMSession
contract, a JSONL worker, and a serve.py launcher. The generic worker
loop gains an optional prompt_prefix_ids (Gemma BOS prepend) and
serving_chat a matching prompt_token_offset so the context count stays
honest. export.py emits get_mutable_buffer_metadata and prefill-chunk
bounds for multi-session.

[ghstack-poisoned]

mergennachin · 2026-06-10T21:38:32Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2026-06-10T21:38:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20207

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 8 New Failures, 1 Pending, 1 Unrelated Failure, 2 Unclassified Failures

As of commit 1a1b822 with merge base f0dff03 ():

NEW FAILURES - The following jobs have failed:

pull / test-qnn-delegate-linux / linux-job (gh)
RuntimeError: Command docker exec -t c443ea19bec564683964072c650ac7e4155f0684281e997f50ca9240046fc2f0 /exec failed with exit code 92
pull / test-voxtral-realtime-xnnpack-linux / linux-job (gh)
RuntimeError: Command docker exec -t 76f9d85218a21899cafad76bf3376e70115afa57adee30b4410955d44daba8c2 /exec failed with exit code 1
pull / unittest / linux / linux-job (gh)
RuntimeError: Command docker exec -t e89cfb8650ba6c8cdd2afe59e51fd823e94e1be659261428ec8ebdfeeaa10705 /exec failed with exit code 1
pull / unittest / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / unittest-editable / linux / linux-job (gh)
RuntimeError: Command docker exec -t 1d57842251f23a8eb8d597da80d2d93d61e419ccc88c04b9c76643823b58a1da /exec failed with exit code 1
pull / unittest-editable / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / unittest-release / linux / linux-job (gh)
RuntimeError: Command docker exec -t 75f23ad6919b6366d12158d661570c24226082e7858f9dc5f61577535f905eb6 /exec failed with exit code 1
trunk / unittest-release / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

Build Windows Wheels / pytorch/executorch / build-wheel-py3_10-cpu (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Build Windows Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_x64

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / android / build-android (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2026-06-10T21:39:02Z

The committers listed above are authorized under a signed CLA.

✅ login: mergennachin / name: Mergen Nachin (1a1b822)

[INITIAL] Update

1a1b822

[ghstack-poisoned]

mergennachin requested review from GregoryComer, JacobSzwejbka, SS-JIA, abhinaykukkadapu, digantdesai, kimishpatel, kirklandsign, larryliu0820, manuelcandales, psiddh, rascani, robert-kalmar and shoumikhin as code owners June 10, 2026 21:38

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples/models/gemma4_31b: CUDA Engine/Session adapter + OpenAI serving#20207

examples/models/gemma4_31b: CUDA Engine/Session adapter + OpenAI serving#20207
mergennachin wants to merge 1 commit into
gh/mergennachin/12/headfrom
gh/mergennachin/13/head

mergennachin commented Jun 10, 2026

Uh oh!

mergennachin commented Jun 10, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

linux-foundation-easycla Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mergennachin commented Jun 10, 2026

Uh oh!

mergennachin commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20207

❌ 8 New Failures, 1 Pending, 1 Unrelated Failure, 2 Unclassified Failures

Uh oh!

linux-foundation-easycla Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mergennachin commented Jun 10, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 10, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Jun 10, 2026 •

edited

Loading