`wrapAISDK` streaming: `time_to_first_token` measures stream framing, not first token (two distinct bugs)


## Summary

When using `wrapAISDK` with AI SDK v6 streaming (`streamText` / `ToolLoopAgent.stream`), `time_to_first_token` is wrong on **both** span levels that record it, for two different reasons:

1. **Parent wrapper span** (`streamText` / `ToolLoopAgent.stream`): the tap timestamps the first chunk of *any* type through `result.baseStream`. AI SDK v6 pre-buffers a synthetic `{type: "start"}` framing part into that stream at construction time — before the provider HTTP request fires — so the recorded value is "time until the consumer starts reading" (single-digit milliseconds) and contains zero model latency.
2. **`doStream` child span** (type `llm`): the clock starts *after* `await originalDoStream(...)` resolves — i.e. after the provider's HTTP response headers have arrived, when the real TTFT (request transit, queueing, prefill) has already elapsed — and the first chunk timestamped is the provider's `{type: "stream-start"}` framing part, available on the very first read. The two timestamps land microseconds apart, so the metric clamps to its `1e-6` floor.

Both values are content-free: they do not change whether the model takes 200ms or 20s to produce its first token.

Both bugs are present on `main` today (`js/src/instrumentation/plugins/ai-sdk-plugin.ts`), so the latest release (3.17.0) is affected as well as the version we run (3.11.0).

## Environment

- `braintrust` 3.11.0 (verified identical logic on `main` / 3.17.0)
- `ai` 6.x (AI SDK v6, `LanguageModelV3`)
- `@ai-sdk/openai` (responses API, streaming); also applies to other providers
- Pattern: `const {ToolLoopAgent} = wrapAISDK(ai)` → `agent.stream(...)` → `toUIMessageStream()`

## Bug 1 — parent span: framing chunk counted as first token

`patchAISDKStreamingResult` taps `result.baseStream` and timestamps the **first chunk of any type** ([`ai-sdk-plugin.ts:1383-1401`](https://github.com/braintrustdata/braintrust-sdk-javascript/blob/e927f07a9c983faa038a642e738ab1ea4b82c0ed/js/src/instrumentation/plugins/ai-sdk-plugin.ts#L1383-L1401)):

```ts
if (isReadableStreamLike(resultRecord.baseStream)) {
  let firstChunkTime: number | undefined;
  const wrappedBaseStream = resultRecord.baseStream.pipeThrough(
    new TransformStream({
      transform(chunk, controller) {
        if (firstChunkTime === undefined) {
          firstChunkTime = getCurrentUnixTimestamp();   // ← any chunk type
        }
        controller.enqueue(chunk);
      },
      async flush() {
        const metrics = extractTopLevelAISDKMetrics(result, endEvent);
        if (metrics.time_to_first_token === undefined && firstChunkTime !== undefined) {
          metrics.time_to_first_token = firstChunkTime - startTime;
        }
```

But in AI SDK v6, the stream that feeds `baseStream` enqueues a synthetic `start` part **in the ReadableStream's `start()` callback**, which runs synchronously at stream construction — the moment `streamText()` / `agent.stream()` is invoked, before any network activity:

```js
// ai (v6), streamText result construction
let stream = new ReadableStream({
  async start(controller) {
    controller.enqueue({ type: "start" });   // ← pre-buffered at construction
  },
  ...
this.baseStream = stream.pipeThrough(createOutputTransformStream(...)).pipeThrough(eventProcessor);
```

All consumer surfaces (`textStream`, `fullStream`, `toUIMessageStream`) read via `teeStream()`, which reads the **instance property** `this.baseStream` — so the `Object.defineProperty(resultRecord, "baseStream", ...)` patch does interpose on real consumption, and the first thing the tap observes is the pre-buffered `start` part:

```js
// ai (v6)
teeStream() {
  const [stream1, stream2] = this.baseStream.tee();  // reads patched property
  this.baseStream = stream2;
  return stream1;
}
```

Result: `time_to_first_token` = (consumption start − `agent.stream()` call) ≈ a few ms, regardless of model latency.

The fallback branch (`findAsyncIterableField` over `["partialObjectStream", "textStream", "fullStream", "stream"]`) has the same any-chunk-type issue for the `fullStream`/`stream` fields.

## Bug 2 — `doStream` child span: clock starts after headers AND counts `stream-start`

The model patch has two compounding problems ([`ai-sdk-plugin.ts:1106-1120`](https://github.com/braintrustdata/braintrust-sdk-javascript/blob/e927f07a9c983faa038a642e738ab1ea4b82c0ed/js/src/instrumentation/plugins/ai-sdk-plugin.ts#L1106-L1120)):

```ts
const result = await withCurrent(span, () =>
  Reflect.apply(originalDoStream, resolvedModel, [options]),  // ← HTTP request happens here;
);                                                            //   resolves at response HEADERS
const streamStartTime = getCurrentUnixTimestamp();            // ← clock starts AFTER that
let firstChunkTime: number | undefined;
const transformStream = new TransformStream({
  transform(chunk: AISDKModelStreamChunk, controller) {
    if (firstChunkTime === undefined) {
      firstChunkTime = getCurrentUnixTimestamp();             // ← any chunk type, incl. stream-start
    }
    ...
    case "finish":
      ...
      metrics.time_to_first_token = Math.max(
        firstChunkTime - streamStartTime,
        1e-6                                                  // ← the floor everyone actually sees
      );
```

- **Clock starts too late.** `originalDoStream()` resolves when the provider's HTTP response begins. Everything that constitutes real TTFT has already happened before `streamStartTime` is captured.
- **First chunk is framing.** Providers enqueue `{type: "stream-start", warnings}` in their transformer's `start()` callback (e.g. `@ai-sdk/openai`), so it's available on the first read, microseconds after `streamStartTime`.

Net: `firstChunkTime - streamStartTime ≈ 0`, clamped to `1e-6`. This exact value is visible (unremarked) in the repro output of #1868, which shows `"time_to_first_token": 0.000001` on a `doStream` span.

## Production evidence

Same service, same week, same traffic (OpenAI responses API, ~70–90k token prompts, reasoning enabled):

| Source | p50 | p95 | min (7d) |
|---|---:|---:|---:|
| AI SDK's own OTel attribute `ai.response.msToFirstChunk` | **990 ms** | **2,621 ms** | **669 ms** |
| Braintrust `time_to_first_token` (Monitor preset) | ~ms | **< 100 ms** | `1e-6` s |

The *minimum* real TTFT observed over 7 days is ~7,000× larger than the Braintrust p95. A sub-100ms p95 TTFT for a reasoning model on 70k-token prompts is not physically possible.

## Why the AI SDK core gets it right

`streamText` computes `ai.response.msToFirstChunk` correctly — it **explicitly skips the framing chunk** before timestamping:

```js
// ai (v6)
async transform(chunk, controller) {
  ...
  if (chunk.type === "stream-start") {
    warnings = chunk.warnings;
    return;                              // ← skipped, no timestamp
  }
  if (stepFirstChunk) {
    const msToFirstChunk = now() - startTimestampMs;
    ...
```

## Suggested fix

Only set `firstChunkTime` on **content-bearing** part types, in both taps:

- `baseStream` tap (Bug 1): ignore `start`, `start-step`, `stream-start`, `response-metadata`; timestamp on first `text-delta`, `reasoning-delta`, `tool-input-start` / `tool-call`, `object`, or `raw`.
- `doStream` patch (Bug 2): additionally capture `streamStartTime` **before** `await originalDoStream(...)`, so the request/queue/prefill phase is included (matching the AI SDK's own `startTimestampMs` semantics).

## Related

- #1868 — different bug (cache-token metrics), but its pasted span metrics show this bug's signature value (`"time_to_first_token": 0.000001`).
- #1401 — `wrapAnthropic` TTFT type bug (string vs number); separate code path, same metric.
- #617 — a previous, different TTFT bug in the pre-`wrapAISDK` wrapper ("do not set time_to_first_token to the latest token").


Source	p50	p95	min (7d)
AI SDK's own OTel attribute `ai.response.msToFirstChunk`	990 ms	2,621 ms	669 ms
Braintrust `time_to_first_token` (Monitor preset)	~ms	< 100 ms	`1e-6` s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`wrapAISDK` streaming: `time_to_first_token` measures stream framing, not first token (two distinct bugs) #2110

Summary

Environment

Bug 1 — parent span: framing chunk counted as first token

Bug 2 — `doStream` child span: clock starts after headers AND counts `stream-start`

Production evidence

Why the AI SDK core gets it right

Suggested fix

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

wrapAISDK streaming: time_to_first_token measures stream framing, not first token (two distinct bugs) #2110

Description

Summary

Environment

Bug 1 — parent span: framing chunk counted as first token

Bug 2 — doStream child span: clock starts after headers AND counts stream-start

Production evidence

Why the AI SDK core gets it right

Suggested fix

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`wrapAISDK` streaming: `time_to_first_token` measures stream framing, not first token (two distinct bugs) #2110

Bug 2 — `doStream` child span: clock starts after headers AND counts `stream-start`