Skip to content

[DO NOT MERGE] Add blog post: No Token Left Behind — TITO in Miles#335

Open
Shi-Dong wants to merge 4 commits into
lm-sys:mainfrom
Shi-Dong:shi/tito-blog
Open

[DO NOT MERGE] Add blog post: No Token Left Behind — TITO in Miles#335
Shi-Dong wants to merge 4 commits into
lm-sys:mainfrom
Shi-Dong:shi/tito-blog

Conversation

@Shi-Dong
Copy link
Copy Markdown

Summary

  • Adds a new blog post under blog/2026-05-13-no-token-left-behind.md titled "No Token Left Behind: Demystifying Token-In-Token-Out in Miles", explaining the TITO design principle and its implementation in the Miles RL framework.
  • The post walks through three common ways TITO breaks (detokenize-retokenize mismatch, chat-template cut-thinking, lossy chat-template re-rendering) and the four-component Miles implementation: inference session server, append-only enforcement at three levels, a pluggable TITO tokenizer with per-model splice-point patches (Qwen3, GLM-4.7), and a token-sequence comparator with CI verification.
  • Adds 7 supporting diagrams under public/images/blog/tito/.

Test plan

  • npm run dev locally and verify the new post renders correctly on /blog/no-token-left-behind.
  • Confirm the previewImg (/images/blog/tito/definition.png) shows on the blog index.
  • Inline math ($n$, $x_t$, $\pi(x_t|\mathbf{x})$) renders.
  • All seven embedded images load.
  • External links to the Miles repo and Hugging Face chat-template playground resolve.

Shi Dong added 3 commits May 13, 2026 07:16
Introduces the Token-In-Token-Out (TITO) design principle in the
Miles RL framework: three common failure modes (detokenize-retokenize
mismatch, chat-template cut-thinking, lossy re-rendering) and the
four-component implementation (inference session server, append-only
enforcement at three levels, pluggable TITO tokenizer with per-model
splice-point patches, and a token-sequence comparator with CI
verification).
@Shi-Dong Shi-Dong changed the title Add blog post: No Token Left Behind — TITO in Miles [DO NOT MERGE] Add blog post: No Token Left Behind — TITO in Miles May 13, 2026
Comment thread blog/2026-05-13-no-token-left-behind.md

An *inference session* is a single trajectory's interaction with the inference engine — the sequence of turns belonging to the same task, sharing one growing token buffer. The [inference session server](https://github.com/radixark/miles/blob/3270915550fcd69dce788f382fa8c12548a63618/miles/rollout/session/session_server.py#L24) is a thin server layer that maintains per-trajectory state, keyed by session id. Under each id it holds a growing token buffer `P` that is appended in place every turn. The token buffer preserves each sample's exact token-level info (logprobs, routed experts), so it can be sent directly to training.

![Inference session server architecture](/images/blog/tito/session-server.png)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for Shi: update this diagram.

@@ -0,0 +1,188 @@
---
title: "No Token Left Behind: Demystifying Token-In-Token-Out in Miles"
author: "Miles Team"
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Author: Jiajun Li, Yanbin Jiang, Mao Cheng, Shi Dong, Yusheng Su, Yueming Yuan, Zhichen Zeng, Banghua Zhu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant