[DO NOT MERGE] Add blog post: No Token Left Behind — TITO in Miles by Shi-Dong · Pull Request #335 · lm-sys/lm-sys.github.io

Shi-Dong · 2026-05-13T14:17:14Z

Summary

Adds a new blog post under blog/2026-05-13-no-token-left-behind.md titled "No Token Left Behind: Demystifying Token-In-Token-Out in Miles", explaining the TITO design principle and its implementation in the Miles RL framework.
The post walks through three common ways TITO breaks (detokenize-retokenize mismatch, chat-template cut-thinking, lossy chat-template re-rendering) and the four-component Miles implementation: inference session server, append-only enforcement at three levels, a pluggable TITO tokenizer with per-model splice-point patches (Qwen3, GLM-4.7), and a token-sequence comparator with CI verification.
Adds 7 supporting diagrams under public/images/blog/tito/.

Test plan

npm run dev locally and verify the new post renders correctly on /blog/no-token-left-behind.
Confirm the previewImg (/images/blog/tito/definition.png) shows on the blog index.
Inline math ( $n$ , $x_t$ , $\pi(x_t|\mathbf{x})$ ) renders.
All seven embedded images load.
External links to the Miles repo and Hugging Face chat-template playground resolve.

Introduces the Token-In-Token-Out (TITO) design principle in the Miles RL framework: three common failure modes (detokenize-retokenize mismatch, chat-template cut-thinking, lossy re-rendering) and the four-component implementation (inference session server, append-only enforcement at three levels, pluggable TITO tokenizer with per-model splice-point patches, and a token-sequence comparator with CI verification).

Shi-Dong · 2026-05-13T14:37:48Z

+
+An *inference session* is a single trajectory's interaction with the inference engine — the sequence of turns belonging to the same task, sharing one growing token buffer. The [inference session server](https://github.com/radixark/miles/blob/3270915550fcd69dce788f382fa8c12548a63618/miles/rollout/session/session_server.py#L24) is a thin server layer that maintains per-trajectory state, keyed by session id. Under each id it holds a growing token buffer `P` that is appended in place every turn. The token buffer preserves each sample's exact token-level info (logprobs, routed experts), so it can be sent directly to training.
+
+![Inference session server architecture](/images/blog/tito/session-server.png)


Note for Shi: update this diagram.

Shi-Dong · 2026-05-14T01:16:34Z

@@ -0,0 +1,188 @@
+---
+title: "No Token Left Behind: Demystifying Token-In-Token-Out in Miles"
+author: "Miles Team"


Author: Jiajun Li, Yanbin Jiang, Mao Cheng, Shi Dong, Yusheng Su, Yueming Yuan, Zhichen Zeng, Banghua Zhu

Shi Dong added 3 commits May 13, 2026 07:16

Update author to RadixArk Miles RL Team

0cc78be

Shorten author to Miles Team

e2dec15

Shi-Dong changed the title ~~Add blog post: No Token Left Behind — TITO in Miles~~ [DO NOT MERGE] Add blog post: No Token Left Behind — TITO in Miles May 13, 2026

Shi-Dong commented May 13, 2026

View reviewed changes

Comment thread blog/2026-05-13-no-token-left-behind.md

Add captions to Qwen3 / GLM-4.7 splice-point diagrams

88e0ef0

Shi-Dong commented May 13, 2026

View reviewed changes

Shi-Dong commented May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE] Add blog post: No Token Left Behind — TITO in Miles#335

[DO NOT MERGE] Add blog post: No Token Left Behind — TITO in Miles#335
Shi-Dong wants to merge 4 commits into
lm-sys:mainfrom
Shi-Dong:shi/tito-blog

Shi-Dong commented May 13, 2026

Uh oh!

Uh oh!

Shi-Dong May 13, 2026

Uh oh!

Shi-Dong May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		An inference session is a single trajectory's interaction with the inference engine — the sequence of turns belonging to the same task, sharing one growing token buffer. The [inference session server](https://github.com/radixark/miles/blob/3270915550fcd69dce788f382fa8c12548a63618/miles/rollout/session/session_server.py#L24) is a thin server layer that maintains per-trajectory state, keyed by session id. Under each id it holds a growing token buffer `P` that is appended in place every turn. The token buffer preserves each sample's exact token-level info (logprobs, routed experts), so it can be sent directly to training.

		![Inference session server architecture](/images/blog/tito/session-server.png)

Conversation

Shi-Dong commented May 13, 2026

Summary

Test plan

Uh oh!

Uh oh!

Shi-Dong May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Shi-Dong May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant