fix(ai): undo strict-mode null-widening before structured-output validation#732
fix(ai): undo strict-mode null-widening before structured-output validation#732DrewHoo wants to merge 7 commits into
Conversation
…dation Optional fields are widened to required+nullable for strict structured output, so providers return `null` for an absent optional. Validating that `null` against the original schema failed (`.optional()` is `T | undefined`, not `T | null`), surfacing as a StandardSchemaValidationError — most visibly through @tanstack/ai-openrouter, whose adapter preserves provider nulls. Add `undoNullWidening(value, schema)` to @tanstack/ai-utils: a schema-aware counterpart to `transformNullsToUndefined` that drops only synthesized nulls (those the original JSON Schema disallows) while preserving the ones a `.nullable()`/`.nullish()` field genuinely allows. The chat activity runs it on the structured-output result before Standard Schema validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds undoNullWidening and NullWideningMap to ai-utils, emit null-widening maps from schema conversion, undo synthesized nulls in chat structured-output finalization (promise and streaming), update adapters/tests/docs and add ai-utils dependency. ChangesStructured Output Null Normalization
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/ai-utils/src/transforms.ts`:
- Around line 84-95: resolveSchema currently returns the first
object/array-shaped variant from schema.anyOf/schema.oneOf which can be wrong
when multiple non-null branches match; change the logic in resolveSchema to
collect all non-null matching variants (using the same shape-check: array vs
object heuristics), and only return a variant when exactly one non-null match is
found, otherwise return the original schema to avoid destructive rewrites that
strip valid null branches; reference the resolveSchema function and the local
variables variants, match, isArray when making this change.
- Around line 110-115: The array handling currently applies resolved.items[0] to
every element; change the mapping to pick the schema per index when
resolved.items is an array: compute itemSchemaForIndex =
Array.isArray(resolved.items) ? resolved.items[i] ||
resolved.items[resolved.items.length - 1] || resolved.additionalItems ||
resolved.items[0] : resolved.items, then call value.map((item, i) => walk(item,
itemSchemaForIndex)); update the block around resolveSchema, resolved, and walk
to use the index-aware schema selection so tuple-style schemas are handled
correctly.
In `@packages/ai-utils/tests/transforms.test.ts`:
- Around line 2-6: The import order violates ESLint import/order: combine the
type and value imports into a single statement or place the value imports before
the type import so types don't come first; specifically adjust the imports
involving JsonSchemaNode, transformNullsToUndefined, and undoNullWidening so
they are imported in one consolidated line or with transformNullsToUndefined and
undoNullWidening imported before the type JsonSchemaNode to satisfy the linter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dd2ec226-feb0-42be-b3f6-5e7d26a784d9
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (7)
.changeset/structured-output-undo-null-widening.mdpackages/ai-utils/src/index.tspackages/ai-utils/src/transforms.tspackages/ai-utils/tests/transforms.test.tspackages/ai/package.jsonpackages/ai/src/activities/chat/index.tspackages/ai/tests/chat-structured-output-null-normalization.test.ts
…oNullWidening Addresses CodeRabbit review on TanStack#732: - resolveSchema now descends only when exactly one non-null anyOf/oneOf branch matches the value's shape; ambiguous unions keep the original schema rather than risk stripping a null a sibling branch allows. - Array walking applies tuple-style `items: [a, b, …]` schemas per index instead of always using the first. Adds coverage for both and fixes the test's import order. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks @coderabbitai — all three addressed in 3f44678:
Added unit coverage for both the tuple and ambiguous-union cases. |
|
Tip For best results, initiate chat on the files or code changes.
(ノ✧▽✧)ノ 🐇 |
|
View your CI Pipeline Execution ↗ for commit 49f7a48
☁️ Nx Cloud last updated this comment at |
@tanstack/ai
@tanstack/ai-anthropic
@tanstack/ai-client
@tanstack/ai-code-mode
@tanstack/ai-code-mode-skills
@tanstack/ai-devtools-core
@tanstack/ai-elevenlabs
@tanstack/ai-event-client
@tanstack/ai-fal
@tanstack/ai-gemini
@tanstack/ai-grok
@tanstack/ai-groq
@tanstack/ai-isolate-cloudflare
@tanstack/ai-isolate-node
@tanstack/ai-isolate-quickjs
@tanstack/ai-mcp
@tanstack/ai-ollama
@tanstack/ai-openai
@tanstack/ai-openrouter
@tanstack/ai-preact
@tanstack/ai-react
@tanstack/ai-react-ui
@tanstack/ai-solid
@tanstack/ai-solid-ui
@tanstack/ai-svelte
@tanstack/ai-utils
@tanstack/ai-vue
@tanstack/ai-vue-ui
@tanstack/openai-base
@tanstack/preact-ai-devtools
@tanstack/react-ai-devtools
@tanstack/solid-ai-devtools
commit: |
…re-deriving it Replace the schema-guessing `undoNullWidening` — which reverse-engineered which nulls strict-mode widening synthesized by pattern-matching response values against the un-widened schema's anyOf branches, and bailed on ambiguity — with a precise map recorded by the widening pass itself. `makeStructuredOutputCompatible` now returns the strict schema plus a `NullWideningMap` marking exactly the positions where it added `null`. The new `convertSchemaForStructuredOutput` exposes both, and the chat activity threads that map into `undoNullWidening`. This drops `resolveSchema`/`allowsNull` branch guessing, preserves `.nullish()` nulls by construction, and closes the ambiguous-union gap where synthesized nulls were previously left in place. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed a refactor of the null-normalization internals (behaviour unchanged; all unit + integration tests green). What changed: the previous approach re-derived which nulls were synthetic at validation time by walking the response value against the un-widened schema and pattern-matching
Why it's better:
The |
…am modes and every adapter Strict-mode structured output widens optional fields to `required` + nullable, so providers return `null` for an absent optional. That `null` fails validation against the original `.optional()` schema (`T | undefined`, not `T | null`). Previously only the Promise<T> path un-widened, and only for adapters that preserved provider nulls (OpenRouter). The OpenAI-family adapters instead blind-stripped every null via `transformStructuredOutput`, which masked the bug but also destroyed genuine `.nullable()` nulls — and the streaming path didn't un-widen at all. Move un-widening into the engine, the one layer that holds the schema's null-widening map: - Add `finalStructuredOutput.normalize`, applied the instant the structured output is captured, so it flows to BOTH the streaming `structured-output.complete` event and the Promise<T> result (plus the native-combined harvest path). Both activity callers now pass it via `convertSchemaForStructuredOutput`; streaming switches off the map-less `convertSchemaToJsonSchema`. Validation runs on already-normalized data. - openai-base `transformStructuredOutput` default is now a passthrough — the blind null-strip is gone (the engine un-widens precisely instead). Fixes the responses-text streaming path that bypassed the hook. OpenAI/Grok/Groq inherit this; OpenRouter's now-redundant override is simplified and its dead `transformNullsToUndefined` imports dropped. Genuine `.nullable()` nulls now survive on every adapter and both directions; synthesized optional nulls are dropped everywhere. Tests: streaming normalization + a converter→undo round-trip (closing the untested map-production gap); adapter passthrough tests updated; e2e gains an optional field returning `null` asserted un-widened across all 5 streaming providers (real regression guard for OpenRouter). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tombeckenham
left a comment
There was a problem hiding this comment.
Thanks for finding this @DrewHoo. This pointed me to an adjust bug in the open-ai base as well.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/ai/tests/chat-structured-output-null-normalization.test.ts (1)
1-1:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMove this unit test next to the source module it verifies.
This file is under
packages/ai/tests/, but the repository rule requires*.test.tsfiles to be colocated with source.
As per coding guidelines, "**/*.test.ts: Place unit tests alongside source code in*.test.tsfiles`."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/ai/tests/chat-structured-output-null-normalization.test.ts` at line 1, Move the test file chat-structured-output-null-normalization.test.ts out of packages/ai/tests/ and colocate it with the source module it verifies (place it in the same directory as the module implementing the structured output normalization). Update any relative imports inside the test to use the new path and ensure the test name remains unchanged; if the source module exports a function or class used in the test (e.g., the structured output normalization function), import it with the correct relative path so the test runs from its new location.Source: Coding guidelines
🧹 Nitpick comments (1)
packages/openai-base/tests/chat-completions-text.test.ts (1)
726-766: ⚡ Quick winAdd a direct
structuredOutputStream()passthrough regression here too.This new assertion only exercises
structuredOutput(), but the same behavior change also landed in the adapter streaming path. One base-level stream test here would catch a regression wherestructuredOutputStream()still stripsnullinstead of relying on engine normalization.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/openai-base/tests/chat-completions-text.test.ts` around lines 726 - 766, Add a sibling streaming test that mirrors the non-streaming case to ensure structuredOutputStream() also passes provider nulls through unchanged: reuse setupMockSdkClient and TestChatCompletionsAdapter with a streaming response equivalent to nonStreamResponse, call adapter.structuredOutputStream(...) with the same chatOptions and outputSchema used in the existing test, collect the streamed result and assert the emitted data contains name === 'Alice' and nickname === null; this prevents regressions where structuredOutputStream() would strip nulls while structuredOutput() does not.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/structured-outputs/overview.md`:
- Around line 89-91: Add paired server- and client-side code snippets
demonstrating the behavior described: show a server endpoint (using the engine's
server API that emits structured-output.complete and performs
normalization/widening) that returns a validated object according to the
discussed schema, and a matching client usage example that calls chat({
outputSchema }) for non-streaming and chat({ outputSchema, stream:true }) or
useChat({ outputSchema }) for streaming, showing how to read value.object from
the structured-output.complete event or call
parseWithStandardSchema/parsePartialJSON. Reference the documented symbols in
the snippets (chat({ outputSchema }), useChat({ outputSchema }),
structured-output.complete, value.object, parseWithStandardSchema,
parsePartialJSON) so readers can see both sides and how the server
normalization/un-widening maps to the client consumption.
---
Outside diff comments:
In `@packages/ai/tests/chat-structured-output-null-normalization.test.ts`:
- Line 1: Move the test file chat-structured-output-null-normalization.test.ts
out of packages/ai/tests/ and colocate it with the source module it verifies
(place it in the same directory as the module implementing the structured output
normalization). Update any relative imports inside the test to use the new path
and ensure the test name remains unchanged; if the source module exports a
function or class used in the test (e.g., the structured output normalization
function), import it with the correct relative path so the test runs from its
new location.
---
Nitpick comments:
In `@packages/openai-base/tests/chat-completions-text.test.ts`:
- Around line 726-766: Add a sibling streaming test that mirrors the
non-streaming case to ensure structuredOutputStream() also passes provider nulls
through unchanged: reuse setupMockSdkClient and TestChatCompletionsAdapter with
a streaming response equivalent to nonStreamResponse, call
adapter.structuredOutputStream(...) with the same chatOptions and outputSchema
used in the existing test, collect the streamed result and assert the emitted
data contains name === 'Alice' and nickname === null; this prevents regressions
where structuredOutputStream() would strip nulls while structuredOutput() does
not.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a00759bd-cbca-4340-bd2a-eafb21c3b2b7
📒 Files selected for processing (15)
.changeset/structured-output-undo-null-widening.mddocs/config.jsondocs/structured-outputs/overview.mdpackages/ai-openrouter/src/adapters/responses-text.tspackages/ai-openrouter/src/adapters/text.tspackages/ai-openrouter/tests/openrouter-adapter.test.tspackages/ai/src/activities/chat/index.tspackages/ai/tests/chat-structured-output-null-normalization.test.tspackages/openai-base/src/adapters/chat-completions-text.tspackages/openai-base/src/adapters/responses-text.tspackages/openai-base/tests/chat-completions-text.test.tspackages/openai-base/tests/responses-text.test.tstesting/e2e/fixtures/structured-output-stream/basic.jsontesting/e2e/src/lib/schemas.tstesting/e2e/tests/structured-output-stream.spec.ts
✅ Files skipped from review due to trivial changes (4)
- testing/e2e/fixtures/structured-output-stream/basic.json
- docs/config.json
- packages/ai-openrouter/src/adapters/responses-text.ts
- packages/ai-openrouter/src/adapters/text.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- .changeset/structured-output-undo-null-widening.md
| > **Note:** Server-side validation is **path-dependent**. For the non-streaming agentic path (`await chat({ outputSchema })`), the engine runs Standard Schema validation inside the finalization step and routes failures through `onError` (the awaited promise rejects). For the streaming path (`chat({ outputSchema, stream: true })`), Standard Schema _validation_ is deliberately deferred to the consumer — consumers read the object from the `structured-output.complete` event's `value.object` field (or call `parseWithStandardSchema` themselves on the raw text). The schema you pass to `useChat({ outputSchema })` on the client is used for TypeScript inference and (in `useChat`) for client-side `parsePartialJSON`-based progressive parsing — the typed-object guarantee comes from the server-side path you pick. | ||
| > | ||
| > On **both** paths the engine normalizes the captured object before it reaches you: to satisfy strict providers, optional fields are widened to `required` + nullable, so the provider returns `null` for an absent optional. The engine undoes exactly that widening — an `.optional()` field that came back `null` reads back as **absent** (matching `T | undefined`), while a genuine `.nullable()` field's `null` is **preserved**. So `value.object` (streaming) and the awaited result (non-streaming) both carry the un-widened shape your schema describes. |
There was a problem hiding this comment.
Add paired server and client snippets in this section.
This note spans both server-side and client-side behavior but doesn’t include both code sides in the updated section.
As per coding guidelines, "Show both server and client sides of the coin when a doc spans both; include snippets for both the server endpoint AND client consumption."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/structured-outputs/overview.md` around lines 89 - 91, Add paired server-
and client-side code snippets demonstrating the behavior described: show a
server endpoint (using the engine's server API that emits
structured-output.complete and performs normalization/widening) that returns a
validated object according to the discussed schema, and a matching client usage
example that calls chat({ outputSchema }) for non-streaming and chat({
outputSchema, stream:true }) or useChat({ outputSchema }) for streaming, showing
how to read value.object from the structured-output.complete event or call
parseWithStandardSchema/parsePartialJSON. Reference the documented symbols in
the snippets (chat({ outputSchema }), useChat({ outputSchema }),
structured-output.complete, value.object, parseWithStandardSchema,
parsePartialJSON) so readers can see both sides and how the server
normalization/un-widening maps to the client consumption.
Source: Coding guidelines
…n; fix comment
Follow-up to the engine-level un-widening commit, addressing review gaps:
- Fix an inaccurate inline comment: the `structured-output.complete` event's
value is `{ object, raw, reasoning? }` — it carries no `messageId` (that's on
`structured-output.start`). The outbound-chunk rewrite preserves `raw`, not
`messageId`.
- Add native-combined mode coverage (the `harvestCombinedStructuredOutput`
capture site was untested): both the harvested Promise<T> result and the
synthesized streaming complete event must un-widen.
- Add a streaming-rewrite test asserting the engine replaces only `object`
(un-widened) while spreading the event's sibling `raw`/`reasoning` through
untouched — guards the `{ ...value, object }` contract.
- Add a round-trip case proving a genuine `.nullable()` null inside an array
item survives (the spot the array/tuple handling could wrongly strip).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s through Addresses CodeRabbit review (PR TanStack#732): the non-streaming passthrough assertion had no streaming sibling. Adds a `structuredOutputStream()` case emitting a provider `null` and asserting the terminal `structured-output.complete` object preserves it — guarding against the stream path regressing to a blind null-strip while the non-stream path relies on engine-level un-widening. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Structured output rejects a valid provider response when the schema has an optional field. To satisfy OpenAI-style strict schemas, optional fields are widened to
required+ nullable, so the provider returnsnullfor an absent optional — but thatnullis then validated against the original schema, where.optional()meansT | undefined, notT | null. Validation throws.This adds a schema-aware
undoNullWideningto@tanstack/ai-utilsand runs it on the structured-output result before Standard Schema validation, so the synthesized nulls are dropped while genuine.nullable()/.nullish()nulls are preserved.Fixes #731
Reproduction
The outbound schema sent to the provider widens
noteto{ "type": ["string", "null"] }and adds it torequired, so the model returns{ "title": "…", "note": null }. The original Valibot schema rejects thatnull.Root cause
convertSchemaToJsonSchema(schema, { forStructuredOutput: true })widens optional →required+ nullable (correct, and required for strict mode). The round trip just never undoes it: the response is validated against the un-widened original viaparseWithStandardSchema, which rejects thenullthe widening told the model to send.Fix
@tanstack/ai-utilsgainsundoNullWidening(value, schema)— a schema-aware counterpart totransformNullsToUndefined. It consults the original (un-widened) JSON Schema and drops anullonly where the schema disallows one (a synthesized widening), preserving nulls a.nullable()/.nullish()field genuinely allows, and leaving nulls under shapes the schema doesn't describe untouched.@tanstack/airuns it inside the structured-outputvalidatecallback (activities/chat), beforeparseWithStandardSchema, usingconvertSchemaToJsonSchema(outputSchema)(noforStructuredOutput) as the nullability source of truth.Why core, not the adapter
By the time the schema reaches the adapter it's already been widened (
forStructuredOutput: true), so the adapter can no longer tell a synthesized null from a genuine one. Only the activity layer still holds the original schema, so the normalization has to live there. This also fixes it for every adapter on the validated path at once, rather than per-adapter.Tests
@tanstack/ai-utils: unit coverage forundoNullWidening— optional vs. nullable, mixed, nested nullable objects (viaanyOf), array items, schemaless passthrough.@tanstack/ai: end-to-endchat({ outputSchema })regression — a providernullon an optional field now resolves (key absent), and anullon a.nullable()field is preserved.@tanstack/aisuite (1005 tests) +tsc+ eslint pass.Out of scope / follow-up
There's a symmetric latent issue in the
@tanstack/openai-baseadapters: they blind-strip all nulls (transformNullsToUndefined), so a genuinely.nullable()required field that returnsnullis stripped and then fails validation as missing — the inverse of the bug fixed here.optional(T | undefined)nullablerequired (T | null)Unifying that is intentionally not in this PR: the openai-base null-strip runs inside
structuredOutputStream(shared by the non-streaming Promise path and the streamed path, which re-emits the adapter chunk), so fixing it cleanly means either threading the original schema down to the adapters or normalizing the streaming chunk pipeline — a larger, more invasive change. Happy to follow up with that if you'd prefer a single unified pass;undoNullWideningis exported and ready to be reused there.🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
New Features
Documentation
Tests