Summary
When using wrapAISDK with AI SDK v6 streaming (streamText / ToolLoopAgent.stream), time_to_first_token is wrong on both span levels that record it, for two different reasons:
- Parent wrapper span (
streamText / ToolLoopAgent.stream): the tap timestamps the first chunk of any type through result.baseStream. AI SDK v6 pre-buffers a synthetic {type: "start"} framing part into that stream at construction time — before the provider HTTP request fires — so the recorded value is "time until the consumer starts reading" (single-digit milliseconds) and contains zero model latency.
doStream child span (type llm): the clock starts after await originalDoStream(...) resolves — i.e. after the provider's HTTP response headers have arrived, when the real TTFT (request transit, queueing, prefill) has already elapsed — and the first chunk timestamped is the provider's {type: "stream-start"} framing part, available on the very first read. The two timestamps land microseconds apart, so the metric clamps to its 1e-6 floor.
Both values are content-free: they do not change whether the model takes 200ms or 20s to produce its first token.
Both bugs are present on main today (js/src/instrumentation/plugins/ai-sdk-plugin.ts), so the latest release (3.17.0) is affected as well as the version we run (3.11.0).
Environment
braintrust 3.11.0 (verified identical logic on main / 3.17.0)
ai 6.x (AI SDK v6, LanguageModelV3)
@ai-sdk/openai (responses API, streaming); also applies to other providers
- Pattern:
const {ToolLoopAgent} = wrapAISDK(ai) → agent.stream(...) → toUIMessageStream()
Bug 1 — parent span: framing chunk counted as first token
patchAISDKStreamingResult taps result.baseStream and timestamps the first chunk of any type (ai-sdk-plugin.ts:1383-1401):
if (isReadableStreamLike(resultRecord.baseStream)) {
let firstChunkTime: number | undefined;
const wrappedBaseStream = resultRecord.baseStream.pipeThrough(
new TransformStream({
transform(chunk, controller) {
if (firstChunkTime === undefined) {
firstChunkTime = getCurrentUnixTimestamp(); // ← any chunk type
}
controller.enqueue(chunk);
},
async flush() {
const metrics = extractTopLevelAISDKMetrics(result, endEvent);
if (metrics.time_to_first_token === undefined && firstChunkTime !== undefined) {
metrics.time_to_first_token = firstChunkTime - startTime;
}
But in AI SDK v6, the stream that feeds baseStream enqueues a synthetic start part in the ReadableStream's start() callback, which runs synchronously at stream construction — the moment streamText() / agent.stream() is invoked, before any network activity:
// ai (v6), streamText result construction
let stream = new ReadableStream({
async start(controller) {
controller.enqueue({ type: "start" }); // ← pre-buffered at construction
},
...
this.baseStream = stream.pipeThrough(createOutputTransformStream(...)).pipeThrough(eventProcessor);
All consumer surfaces (textStream, fullStream, toUIMessageStream) read via teeStream(), which reads the instance property this.baseStream — so the Object.defineProperty(resultRecord, "baseStream", ...) patch does interpose on real consumption, and the first thing the tap observes is the pre-buffered start part:
// ai (v6)
teeStream() {
const [stream1, stream2] = this.baseStream.tee(); // reads patched property
this.baseStream = stream2;
return stream1;
}
Result: time_to_first_token = (consumption start − agent.stream() call) ≈ a few ms, regardless of model latency.
The fallback branch (findAsyncIterableField over ["partialObjectStream", "textStream", "fullStream", "stream"]) has the same any-chunk-type issue for the fullStream/stream fields.
Bug 2 — doStream child span: clock starts after headers AND counts stream-start
The model patch has two compounding problems (ai-sdk-plugin.ts:1106-1120):
const result = await withCurrent(span, () =>
Reflect.apply(originalDoStream, resolvedModel, [options]), // ← HTTP request happens here;
); // resolves at response HEADERS
const streamStartTime = getCurrentUnixTimestamp(); // ← clock starts AFTER that
let firstChunkTime: number | undefined;
const transformStream = new TransformStream({
transform(chunk: AISDKModelStreamChunk, controller) {
if (firstChunkTime === undefined) {
firstChunkTime = getCurrentUnixTimestamp(); // ← any chunk type, incl. stream-start
}
...
case "finish":
...
metrics.time_to_first_token = Math.max(
firstChunkTime - streamStartTime,
1e-6 // ← the floor everyone actually sees
);
- Clock starts too late.
originalDoStream() resolves when the provider's HTTP response begins. Everything that constitutes real TTFT has already happened before streamStartTime is captured.
- First chunk is framing. Providers enqueue
{type: "stream-start", warnings} in their transformer's start() callback (e.g. @ai-sdk/openai), so it's available on the first read, microseconds after streamStartTime.
Net: firstChunkTime - streamStartTime ≈ 0, clamped to 1e-6. This exact value is visible (unremarked) in the repro output of #1868, which shows "time_to_first_token": 0.000001 on a doStream span.
Production evidence
Same service, same week, same traffic (OpenAI responses API, ~70–90k token prompts, reasoning enabled):
| Source |
p50 |
p95 |
min (7d) |
AI SDK's own OTel attribute ai.response.msToFirstChunk |
990 ms |
2,621 ms |
669 ms |
Braintrust time_to_first_token (Monitor preset) |
~ms |
< 100 ms |
1e-6 s |
The minimum real TTFT observed over 7 days is ~7,000× larger than the Braintrust p95. A sub-100ms p95 TTFT for a reasoning model on 70k-token prompts is not physically possible.
Why the AI SDK core gets it right
streamText computes ai.response.msToFirstChunk correctly — it explicitly skips the framing chunk before timestamping:
// ai (v6)
async transform(chunk, controller) {
...
if (chunk.type === "stream-start") {
warnings = chunk.warnings;
return; // ← skipped, no timestamp
}
if (stepFirstChunk) {
const msToFirstChunk = now() - startTimestampMs;
...
Suggested fix
Only set firstChunkTime on content-bearing part types, in both taps:
baseStream tap (Bug 1): ignore start, start-step, stream-start, response-metadata; timestamp on first text-delta, reasoning-delta, tool-input-start / tool-call, object, or raw.
doStream patch (Bug 2): additionally capture streamStartTime before await originalDoStream(...), so the request/queue/prefill phase is included (matching the AI SDK's own startTimestampMs semantics).
Related
Summary
When using
wrapAISDKwith AI SDK v6 streaming (streamText/ToolLoopAgent.stream),time_to_first_tokenis wrong on both span levels that record it, for two different reasons:streamText/ToolLoopAgent.stream): the tap timestamps the first chunk of any type throughresult.baseStream. AI SDK v6 pre-buffers a synthetic{type: "start"}framing part into that stream at construction time — before the provider HTTP request fires — so the recorded value is "time until the consumer starts reading" (single-digit milliseconds) and contains zero model latency.doStreamchild span (typellm): the clock starts afterawait originalDoStream(...)resolves — i.e. after the provider's HTTP response headers have arrived, when the real TTFT (request transit, queueing, prefill) has already elapsed — and the first chunk timestamped is the provider's{type: "stream-start"}framing part, available on the very first read. The two timestamps land microseconds apart, so the metric clamps to its1e-6floor.Both values are content-free: they do not change whether the model takes 200ms or 20s to produce its first token.
Both bugs are present on
maintoday (js/src/instrumentation/plugins/ai-sdk-plugin.ts), so the latest release (3.17.0) is affected as well as the version we run (3.11.0).Environment
braintrust3.11.0 (verified identical logic onmain/ 3.17.0)ai6.x (AI SDK v6,LanguageModelV3)@ai-sdk/openai(responses API, streaming); also applies to other providersconst {ToolLoopAgent} = wrapAISDK(ai)→agent.stream(...)→toUIMessageStream()Bug 1 — parent span: framing chunk counted as first token
patchAISDKStreamingResulttapsresult.baseStreamand timestamps the first chunk of any type (ai-sdk-plugin.ts:1383-1401):But in AI SDK v6, the stream that feeds
baseStreamenqueues a syntheticstartpart in the ReadableStream'sstart()callback, which runs synchronously at stream construction — the momentstreamText()/agent.stream()is invoked, before any network activity:All consumer surfaces (
textStream,fullStream,toUIMessageStream) read viateeStream(), which reads the instance propertythis.baseStream— so theObject.defineProperty(resultRecord, "baseStream", ...)patch does interpose on real consumption, and the first thing the tap observes is the pre-bufferedstartpart:Result:
time_to_first_token= (consumption start −agent.stream()call) ≈ a few ms, regardless of model latency.The fallback branch (
findAsyncIterableFieldover["partialObjectStream", "textStream", "fullStream", "stream"]) has the same any-chunk-type issue for thefullStream/streamfields.Bug 2 —
doStreamchild span: clock starts after headers AND countsstream-startThe model patch has two compounding problems (
ai-sdk-plugin.ts:1106-1120):originalDoStream()resolves when the provider's HTTP response begins. Everything that constitutes real TTFT has already happened beforestreamStartTimeis captured.{type: "stream-start", warnings}in their transformer'sstart()callback (e.g.@ai-sdk/openai), so it's available on the first read, microseconds afterstreamStartTime.Net:
firstChunkTime - streamStartTime ≈ 0, clamped to1e-6. This exact value is visible (unremarked) in the repro output of #1868, which shows"time_to_first_token": 0.000001on adoStreamspan.Production evidence
Same service, same week, same traffic (OpenAI responses API, ~70–90k token prompts, reasoning enabled):
ai.response.msToFirstChunktime_to_first_token(Monitor preset)1e-6sThe minimum real TTFT observed over 7 days is ~7,000× larger than the Braintrust p95. A sub-100ms p95 TTFT for a reasoning model on 70k-token prompts is not physically possible.
Why the AI SDK core gets it right
streamTextcomputesai.response.msToFirstChunkcorrectly — it explicitly skips the framing chunk before timestamping:Suggested fix
Only set
firstChunkTimeon content-bearing part types, in both taps:baseStreamtap (Bug 1): ignorestart,start-step,stream-start,response-metadata; timestamp on firsttext-delta,reasoning-delta,tool-input-start/tool-call,object, orraw.doStreampatch (Bug 2): additionally capturestreamStartTimebeforeawait originalDoStream(...), so the request/queue/prefill phase is included (matching the AI SDK's ownstartTimestampMssemantics).Related
wrapAISDKdoStream:extractTokenMetricsdropscacheRead/cacheWritefrom V3 usage shape #1868 — different bug (cache-token metrics), but its pasted span metrics show this bug's signature value ("time_to_first_token": 0.000001).wrapAnthropicTTFT type bug (string vs number); separate code path, same metric.wrapAISDKwrapper ("do not set time_to_first_token to the latest token").