feat(ai-elevenlabs): add speech/audio/transcription adapters via official SDK#504
Conversation
…cial SDK (TanStack#485) Extends @tanstack/ai-elevenlabs with three tree-shakeable REST adapters built on the official @elevenlabs/elevenlabs-js SDK — elevenlabsSpeech (TTS), elevenlabsAudio (music + SFX dispatched by model), and elevenlabsTranscription (Scribe v1/v2). Migrates the realtime adapter off the deprecated @11labs/client onto the renamed @elevenlabs/client. Wires ElevenLabs into the ts-react-chat example provider catalogs and the e2e tts/transcription support matrix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds three tree-shakeable REST adapters (TTS, music/SFX audio, transcription) implemented against the official ElevenLabs SDK, migrates realtime code from Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(210,235,255,0.5)
participant Client
end
rect rgba(220,255,220,0.5)
participant Server
participant Adapter
end
rect rgba(255,245,210,0.5)
participant ElevenLabsSDK
participant Storage
end
Client->>Server: POST /api.generate.speech (text, provider=elevenlabs)
Server->>Adapter: buildSpeechAdapter(provider, modelOptions)
Adapter->>ElevenLabsSDK: textToSpeech.convert({ modelId, voiceId, outputFormat, settings })
ElevenLabsSDK-->>Adapter: audio ReadableStream
Adapter->>Storage: readStreamToArrayBuffer -> arrayBufferToBase64
Adapter-->>Server: TTSResult { id, audio: { b64Json, contentType, format } }
Server-->>Client: return or stream audio payload
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
View your CI Pipeline Execution ↗ for commit 28ccc0d
☁️ Nx Cloud last updated this comment at |
@tanstack/ai
@tanstack/ai-anthropic
@tanstack/ai-client
@tanstack/ai-code-mode
@tanstack/ai-code-mode-skills
@tanstack/ai-devtools-core
@tanstack/ai-elevenlabs
@tanstack/ai-event-client
@tanstack/ai-fal
@tanstack/ai-gemini
@tanstack/ai-grok
@tanstack/ai-groq
@tanstack/ai-isolate-cloudflare
@tanstack/ai-isolate-node
@tanstack/ai-isolate-quickjs
@tanstack/ai-ollama
@tanstack/ai-openai
@tanstack/ai-openrouter
@tanstack/ai-preact
@tanstack/ai-react
@tanstack/ai-react-ui
@tanstack/ai-solid
@tanstack/ai-solid-ui
@tanstack/ai-svelte
@tanstack/ai-vue
@tanstack/ai-vue-ui
@tanstack/preact-ai-devtools
@tanstack/react-ai-devtools
@tanstack/solid-ai-devtools
commit: |
The SDK defines a top-level `function getHeader(…)` in `core/fetcher/getHeader.js`, which collides with h3's auto-imported `getHeader` once vite/nitro inline both into the same server chunk — esbuild then rejects the duplicate symbol and the e2e build fails with `The symbol "getHeader" has already been declared`. Marking the SDK as a vite SSR + nitro external keeps it resolved at runtime on the server side, which is what we want anyway for a server-only REST client. Also adds a local `pnpm dev:chat` convenience script to run the ts-react-chat example without remembering the filter flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7d75161 to
de7c302
Compare
…hat and descoping elevenlabs e2e Two CI failures on PR TanStack#504: 1. `ts-react-chat:build` hit the same `getHeader` SSR collision as the e2e app — now that the example wires ElevenLabs into the server-side audio-adapter factories, its SSR bundle faces the same SDK/h3 symbol clash. Same fix (`ssr.external` + nitro `externals.external`) applied to `examples/ts-react-chat/vite.config.ts`. 2. `elevenlabs -- tts` and `elevenlabs -- transcription` e2e tests failed because aimock doesn't yet stub `api.elevenlabs.io` routes — the real SDK HTTP calls had no mock target and errored out. Removed `elevenlabs` from the `tts` + `transcription` support matrix sets in `testing/e2e/{tests/test-matrix.ts,src/lib/feature-support.ts}` for now; the factories stay in `media-providers.ts` so they light up automatically once aimock ships coverage. Tracked as part of the nitro/aimock follow-ups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (5)
packages/typescript/ai-elevenlabs/src/utils/client.ts (3)
131-151:dataUrlToBlobdoes not tolerate whitespace in base64 payloads and silently misroutes invalid data URLs.Two small edge-case notes:
atobthrows on any whitespace / line-wrapping in the base64 payload (common for long data URLs hand-pasted by users or produced by some encoders that insert\n). Stripping whitespace beforeatobavoids a hard throw from inside the adapter.- When
valuelooks like a data URL butcommaIndex === -1, the function returnsundefined, which the caller then treats as anhttpsURL. A malformeddata:URL ending up as an HTTP request is a confusing failure mode — consider throwing aTypeError('Invalid data URL')instead so the error is localized.🛠️ Suggested tightening
- if (!value.startsWith('data:')) return undefined - const commaIndex = value.indexOf(',') - if (commaIndex === -1) return undefined + if (!value.startsWith('data:')) return undefined + const commaIndex = value.indexOf(',') + if (commaIndex === -1) { + throw new TypeError('Invalid data URL: missing comma separator') + } @@ - if (isBase64) { - const binary = atob(payload) + if (isBase64) { + const binary = atob(payload.replace(/\s+/g, ''))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-elevenlabs/src/utils/client.ts` around lines 131 - 151, The dataUrlToBlob function should tolerate whitespace in base64 payloads and fail fast on malformed data: URLs: when value startsWith('data:') but commaIndex === -1, throw a TypeError('Invalid data URL') instead of returning undefined; and when isBase64 is true, strip whitespace (e.g., remove /\s+/g) from the payload before calling atob so atob does not throw on line-wrapped or spaced base64. Update dataUrlToBlob to apply these two changes while preserving existing mimeType handling and non-base64 decode path.
113-123: Nit: unnecessarysliceinreadStreamToArrayBuffer.
mergedis a freshly allocatedUint8Array(total), somerged.byteOffsetis0andmerged.buffer.byteLength === total. Theslice(byteOffset, byteOffset + byteLength)copies the entire buffer again for no benefit. You can just returnmerged.buffer.♻️ Proposed simplification
- return merged.buffer.slice( - merged.byteOffset, - merged.byteOffset + merged.byteLength, - ) + return merged.buffer🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-elevenlabs/src/utils/client.ts` around lines 113 - 123, The code in readStreamToArrayBuffer unnecessarily calls slice on the newly allocated Uint8Array `merged` (whose byteOffset is 0 and whose buffer length equals total), causing an extra copy; simply return `merged.buffer` instead of `merged.buffer.slice(merged.byteOffset, merged.byteOffset + merged.byteLength)` to avoid the redundant allocation/copy and preserve the same ArrayBuffer result.
27-36: Window env lookup is dead code in a server-side SDK—consider removing for clarity or swapping precedence.The test environment is correctly configured as
'node'(notjsdomorhappy-dom), so the transitive dependency concern is mitigated. However, the code still checksglobalThis.window?.envbeforeprocess.env— which would be unsafe if window were shimmed, though that doesn't occur in practice here.Since the SDK is server-side only (adapters externalized from SSR bundle),
window.envlookup is dead code and worth removing. If you want to preserve it for future client-side realtime token fetches, swap the precedence toprocess.envfirst, since the SDK itself runs server-side.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-elevenlabs/src/utils/client.ts` around lines 27 - 36, The getEnvironment function currently checks globalThis.window?.env before process.env; remove the dead client-side lookup and simplify getEnvironment (and its EnvObject usage) to return process.env when available, otherwise undefined—i.e., eliminate the globalThis/window branch in getEnvironment so the server-side SDK always prefers process.env (or if you want to preserve client-side behavior instead swap precedence, ensure process.env is checked first and only fall back to window.env).packages/typescript/ai-elevenlabs/src/adapters/audio.ts (1)
83-86: Union type has a redundant branch.
(A & B) | A | Bis structurally equivalent toA | Bwhen bothAandBconsist solely of optional members (every object satisfies either), so the first branch doesn't add inference value but complicates the public type signature. Dropping it simplifies the exported type without behavior change.♻️ Simplification
export type ElevenLabsAudioProviderOptions = - | (ElevenLabsMusicProviderOptions & ElevenLabsSoundEffectsProviderOptions) | ElevenLabsMusicProviderOptions | ElevenLabsSoundEffectsProviderOptions🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts` around lines 83 - 86, The exported union type ElevenLabsAudioProviderOptions is written as (ElevenLabsMusicProviderOptions & ElevenLabsSoundEffectsProviderOptions) | ElevenLabsMusicProviderOptions | ElevenLabsSoundEffectsProviderOptions which is redundant; replace it with the simplified union ElevenLabsMusicProviderOptions | ElevenLabsSoundEffectsProviderOptions by removing the intersecting branch to clean up the public signature while preserving behavior (refer to the ElevenLabsAudioProviderOptions, ElevenLabsMusicProviderOptions, and ElevenLabsSoundEffectsProviderOptions type names to locate and update the declaration).packages/typescript/ai-elevenlabs/src/adapters/transcription.ts (1)
277-310: Comment overstates the grouping behavior.The docstring says "If no speaker is ever set, we still emit one segment per sentence-ish grouping", but the code only splits on
speakerIdchange — when no speaker IDs are present, all timed words collapse into a single segment (no sentence heuristic is applied). Consider updating the comment to match the actual behavior to avoid future confusion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-elevenlabs/src/adapters/transcription.ts` around lines 277 - 310, The comment above the segmentation loop misstates behavior: the loop over timedWords only splits segments on speakerId changes (using variables timedWords, current, segments, TranscriptionSegment and the w.speakerId check), so when no speakerId is present all words collapse into a single segment; update the docstring/comment to accurately state that segmentation is driven solely by speakerId changes (or alternatively implement a sentence/pausing heuristic if you want actual sentence-ish grouping) so future readers aren’t misled.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts`:
- Around line 141-170: runMusic currently hardcodes modelId:'music_v1' which
ignores the adapter's selected model; update runMusic to forward this.model into
the compose call (use modelId: this.model) so it respects ElevenLabsMusicModel
and matches runSoundEffects behavior, ensuring future/extended music model
strings are preserved when calling this.client.music.compose.
In `@packages/typescript/ai-elevenlabs/src/adapters/speech.ts`:
- Around line 205-222: The function inferOutputFormatFromResponseFormat
currently silently maps unsupported requested formats ('wav'/'aac'/'flac') to
the MP3 fallback ('mp3_44100_128'); update it to accept an optional logging
interface (e.g., add a parameter options or logger) and emit a warning when
falling back so callers see the divergence (reference
inferOutputFormatFromResponseFormat and the default branch); ensure the warning
includes the originally requested format and the actual returned ElevenLabs
format, and keep the existing fallback return value ('mp3_44100_128') for
compatibility.
In `@packages/typescript/ai-elevenlabs/src/adapters/transcription.ts`:
- Around line 236-249: In normalizeAudioInput, the string branch currently
treats any non-data URL as a cloudStorageUrl; change the typeof audio ===
'string' handling to validate the string schema (e.g., accept /^https?:\/\// and
any allowed cloud schemes like s3://, gs://, az://) before returning { kind:
'url', value: audio }, and otherwise throw a clear, descriptive error (include
the incoming value) so malformed URLs/local paths/raw base64 are rejected
locally; keep using dataUrlToBlob for data: URLs and leave the
ArrayBuffer/Blob/File handling unchanged.
In `@packages/typescript/ai-elevenlabs/src/model-meta.ts`:
- Around line 53-59: The predicate for music models is too strict: update
isElevenLabsMusicModel to mirror isElevenLabsSoundEffectsModel by using a prefix
or pattern match (e.g., startsWith('music_')) so future music model IDs like
'music_v2' are recognized; adjust the related usage in adapters/audio.ts where
runMusic currently hardcodes modelId: 'music_v1' to pass through the actual
model string instead, ensuring consistent handling between
isElevenLabsMusicModel, isElevenLabsSoundEffectsModel, and runMusic.
---
Nitpick comments:
In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts`:
- Around line 83-86: The exported union type ElevenLabsAudioProviderOptions is
written as (ElevenLabsMusicProviderOptions &
ElevenLabsSoundEffectsProviderOptions) | ElevenLabsMusicProviderOptions |
ElevenLabsSoundEffectsProviderOptions which is redundant; replace it with the
simplified union ElevenLabsMusicProviderOptions |
ElevenLabsSoundEffectsProviderOptions by removing the intersecting branch to
clean up the public signature while preserving behavior (refer to the
ElevenLabsAudioProviderOptions, ElevenLabsMusicProviderOptions, and
ElevenLabsSoundEffectsProviderOptions type names to locate and update the
declaration).
In `@packages/typescript/ai-elevenlabs/src/adapters/transcription.ts`:
- Around line 277-310: The comment above the segmentation loop misstates
behavior: the loop over timedWords only splits segments on speakerId changes
(using variables timedWords, current, segments, TranscriptionSegment and the
w.speakerId check), so when no speakerId is present all words collapse into a
single segment; update the docstring/comment to accurately state that
segmentation is driven solely by speakerId changes (or alternatively implement a
sentence/pausing heuristic if you want actual sentence-ish grouping) so future
readers aren’t misled.
In `@packages/typescript/ai-elevenlabs/src/utils/client.ts`:
- Around line 131-151: The dataUrlToBlob function should tolerate whitespace in
base64 payloads and fail fast on malformed data: URLs: when value
startsWith('data:') but commaIndex === -1, throw a TypeError('Invalid data URL')
instead of returning undefined; and when isBase64 is true, strip whitespace
(e.g., remove /\s+/g) from the payload before calling atob so atob does not
throw on line-wrapped or spaced base64. Update dataUrlToBlob to apply these two
changes while preserving existing mimeType handling and non-base64 decode path.
- Around line 113-123: The code in readStreamToArrayBuffer unnecessarily calls
slice on the newly allocated Uint8Array `merged` (whose byteOffset is 0 and
whose buffer length equals total), causing an extra copy; simply return
`merged.buffer` instead of `merged.buffer.slice(merged.byteOffset,
merged.byteOffset + merged.byteLength)` to avoid the redundant allocation/copy
and preserve the same ArrayBuffer result.
- Around line 27-36: The getEnvironment function currently checks
globalThis.window?.env before process.env; remove the dead client-side lookup
and simplify getEnvironment (and its EnvObject usage) to return process.env when
available, otherwise undefined—i.e., eliminate the globalThis/window branch in
getEnvironment so the server-side SDK always prefers process.env (or if you want
to preserve client-side behavior instead swap precedence, ensure process.env is
checked first and only fall back to window.env).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dd3d048a-acbf-4832-96a4-cdd8f8e45920
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (30)
.changeset/elevenlabs-rest-adapters.mdexamples/ts-react-chat/src/lib/audio-providers.tsexamples/ts-react-chat/src/lib/server-audio-adapters.tsexamples/ts-react-chat/src/lib/server-fns.tsexamples/ts-react-chat/src/routes/api.generate.audio.tsexamples/ts-react-chat/src/routes/api.generate.speech.tsexamples/ts-react-chat/src/routes/api.transcribe.tsexamples/ts-react-chat/vite.config.tspackage.jsonpackages/typescript/ai-elevenlabs/package.jsonpackages/typescript/ai-elevenlabs/src/adapters/audio.tspackages/typescript/ai-elevenlabs/src/adapters/speech.tspackages/typescript/ai-elevenlabs/src/adapters/transcription.tspackages/typescript/ai-elevenlabs/src/index.tspackages/typescript/ai-elevenlabs/src/model-meta.tspackages/typescript/ai-elevenlabs/src/realtime/adapter.tspackages/typescript/ai-elevenlabs/src/realtime/token.tspackages/typescript/ai-elevenlabs/src/utils/client.tspackages/typescript/ai-elevenlabs/src/utils/index.tspackages/typescript/ai-elevenlabs/tests/audio-adapter.test.tspackages/typescript/ai-elevenlabs/tests/realtime-adapter.test.tspackages/typescript/ai-elevenlabs/tests/speech-adapter.test.tspackages/typescript/ai-elevenlabs/tests/transcription-adapter.test.tstesting/e2e/package.jsontesting/e2e/src/lib/feature-support.tstesting/e2e/src/lib/media-providers.tstesting/e2e/src/lib/providers.tstesting/e2e/src/lib/types.tstesting/e2e/tests/test-matrix.tstesting/e2e/vite.config.ts
| function inferOutputFormatFromResponseFormat( | ||
| format: TTSOptions['format'] | undefined, | ||
| ): ElevenLabsOutputFormat | undefined { | ||
| switch (format) { | ||
| case 'mp3': | ||
| return 'mp3_44100_128' | ||
| case 'pcm': | ||
| return 'pcm_44100' | ||
| case 'opus': | ||
| return 'opus_48000_128' | ||
| case undefined: | ||
| return undefined | ||
| default: | ||
| // `aac` / `flac` / `wav` are not native ElevenLabs formats — | ||
| // fall back to mp3 rather than blowing up mid-request. | ||
| return 'mp3_44100_128' | ||
| } | ||
| } |
There was a problem hiding this comment.
Silent mp3 fallback for wav/aac/flac may surprise callers.
When the caller explicitly requests format: 'wav' (or aac/flac), the adapter silently returns MP3 audio (the returned format/contentType reflect MP3, which is correct, but the mismatch between the request and the actual returned format is not surfaced). Consider at least logging a warning through options.logger so the divergence is observable, or narrowing TTSOptions['format'] at the type level per provider.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-elevenlabs/src/adapters/speech.ts` around lines 205 -
222, The function inferOutputFormatFromResponseFormat currently silently maps
unsupported requested formats ('wav'/'aac'/'flac') to the MP3 fallback
('mp3_44100_128'); update it to accept an optional logging interface (e.g., add
a parameter options or logger) and emit a warning when falling back so callers
see the divergence (reference inferOutputFormatFromResponseFormat and the
default branch); ensure the warning includes the originally requested format and
the actual returned ElevenLabs format, and keep the existing fallback return
value ('mp3_44100_128') for compatibility.
| function normalizeAudioInput( | ||
| audio: TranscriptionOptions['audio'], | ||
| ): NormalizedAudio { | ||
| if (audio instanceof ArrayBuffer) { | ||
| return { kind: 'file', value: new Blob([audio]) } | ||
| } | ||
| if (typeof audio === 'string') { | ||
| const blob = dataUrlToBlob(audio) | ||
| if (blob) return { kind: 'file', value: blob } | ||
| return { kind: 'url', value: audio } | ||
| } | ||
| // Blob or File both fit the SDK's `Uploadable` contract. | ||
| return { kind: 'file', value: audio } | ||
| } |
There was a problem hiding this comment.
Unvalidated string fallback lands in cloudStorageUrl.
Any string that is not a data URL is forwarded to the SDK as cloudStorageUrl, including malformed URLs, local file paths, or raw base64 without the data: prefix. The failure then surfaces as a remote SDK/API error rather than a clear local one. Consider constraining the fallback to http(s):// prefixes (or known cloud schemes) and throwing a descriptive error otherwise.
♻️ Suggested tightening
function normalizeAudioInput(
audio: TranscriptionOptions['audio'],
): NormalizedAudio {
if (audio instanceof ArrayBuffer) {
return { kind: 'file', value: new Blob([audio]) }
}
if (typeof audio === 'string') {
const blob = dataUrlToBlob(audio)
if (blob) return { kind: 'file', value: blob }
- return { kind: 'url', value: audio }
+ if (/^https?:\/\//i.test(audio)) return { kind: 'url', value: audio }
+ throw new Error(
+ 'ElevenLabs transcription: string audio must be a data: URL or http(s):// URL.',
+ )
}
// Blob or File both fit the SDK's `Uploadable` contract.
return { kind: 'file', value: audio }
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-elevenlabs/src/adapters/transcription.ts` around lines
236 - 249, In normalizeAudioInput, the string branch currently treats any
non-data URL as a cloudStorageUrl; change the typeof audio === 'string' handling
to validate the string schema (e.g., accept /^https?:\/\// and any allowed cloud
schemes like s3://, gs://, az://) before returning { kind: 'url', value: audio
}, and otherwise throw a clear, descriptive error (include the incoming value)
so malformed URLs/local paths/raw base64 are rejected locally; keep using
dataUrlToBlob for data: URLs and leave the ArrayBuffer/Blob/File handling
unchanged.
| export function isElevenLabsMusicModel(model: string): boolean { | ||
| return model === 'music_v1' | ||
| } | ||
|
|
||
| export function isElevenLabsSoundEffectsModel(model: string): boolean { | ||
| return model.startsWith('eleven_text_to_sound_') | ||
| } |
There was a problem hiding this comment.
Predicate asymmetry limits forward‑compatibility for music models.
isElevenLabsMusicModel uses exact equality while isElevenLabsSoundEffectsModel uses a prefix match. Combined with ElevenLabsMusicModel being widened to 'music_v1' | (string & {}) and the file's own comment that "ElevenLabs ships new model IDs more often than we cut a release", any future music model (e.g. music_v2) will fall through to the "Unsupported ElevenLabs audio model" error in adapters/audio.ts even though the type accepts it. Aligning with the SFX predicate style keeps the contract consistent and avoids a code change every time a new music model ships.
♻️ Suggested tweak
-export function isElevenLabsMusicModel(model: string): boolean {
- return model === 'music_v1'
-}
+export function isElevenLabsMusicModel(model: string): boolean {
+ return model === 'music_v1' || model.startsWith('music_v')
+}Note: this also requires dropping the hardcoded modelId: 'music_v1' in runMusic (see comment on audio.ts).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-elevenlabs/src/model-meta.ts` around lines 53 - 59,
The predicate for music models is too strict: update isElevenLabsMusicModel to
mirror isElevenLabsSoundEffectsModel by using a prefix or pattern match (e.g.,
startsWith('music_')) so future music model IDs like 'music_v2' are recognized;
adjust the related usage in adapters/audio.ts where runMusic currently hardcodes
modelId: 'music_v1' to pass through the actual model string instead, ensuring
consistent handling between isElevenLabsMusicModel,
isElevenLabsSoundEffectsModel, and runMusic.
…to SDK
Drop `(string & {})` widening from the ElevenLabs model id types so callers
are blocked from passing unknown models — the pinned lists are now the
source of truth, kept in sync via the automated SDK update pipeline.
Alias `ElevenLabsOutputFormat` to the SDK's `AllowedOutputFormats` so that
a plain `@elevenlabs/elevenlabs-js` version bump carries the format list
through with no manual regeneration. Removes drift (`mp3_24000_48`,
`pcm_32000` were already missing) and lets us drop the `as never` casts
at the SDK boundary.
Also promote `isElevenLabsMusicModel` / `isElevenLabsSoundEffectsModel`
to type predicates so the dispatch in `runMusic` / `runSoundEffects` is
visibly narrowed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
packages/typescript/ai-elevenlabs/src/adapters/speech.ts (1)
171-173: UnusedgenerateIdoverride.
generateSpeechcalls the importedgenerateId(this.name)utility directly (line 156), so this protected override is never invoked.BaseTTSAdapteralready ships agenerateId()implementation; this override can be dropped to reduce surface area.♻️ Suggested cleanup
- protected override generateId(): string { - return generateId(this.name) - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-elevenlabs/src/adapters/speech.ts` around lines 171 - 173, The protected override generateId(): string in the Speech adapter is unused because generateSpeech() calls the imported utility generateId(this.name) directly, so remove the redundant override to rely on BaseTTSAdapter's generateId() implementation; delete the protected override generateId() method from the class and ensure there are no other references to that override so generateSpeech and other methods use the base-class behavior.packages/typescript/ai-elevenlabs/src/adapters/audio.ts (2)
219-221:generateIdoverride is effectively dead code within this class.The override replaces the base-class helper, but nothing inside
ElevenLabsAudioAdaptercallsthis.generateId()—finalize(line 209) calls the imported utilitygenerateId(this.name)directly, bypassing the method. Either drop the override or routefinalizethroughthis.generateId()so the override actually takes effect (and any future subclass can customize it). Not a bug, just cleanup.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts` around lines 219 - 221, The class has an unused override of generateId() in ElevenLabsAudioAdapter because finalize currently calls the imported utility generateId(this.name) directly; update finalize to call this.generateId() so the override is honored (or alternatively remove the override if you prefer not to support customization). Locate the finalize method and replace the direct call to the imported generateId(...) with a call to this.generateId(), ensuring the class-level override will be invoked for subclasses.
85-88: Redundant intersection branch in the provider-options union.
(ElevenLabsMusicProviderOptions & ElevenLabsSoundEffectsProviderOptions)is already a subtype of bothElevenLabsMusicProviderOptionsandElevenLabsSoundEffectsProviderOptions, so the first member of the union adds nothing — the whole expression is structurally equivalent toElevenLabsMusicProviderOptions | ElevenLabsSoundEffectsProviderOptions. The extra branch also subtly encourages callers to pass music+SFX fields together, which the adapter doesn't actually honor (each code path only reads its own subset).♻️ Proposed simplification
-export type ElevenLabsAudioProviderOptions = - | (ElevenLabsMusicProviderOptions & ElevenLabsSoundEffectsProviderOptions) - | ElevenLabsMusicProviderOptions - | ElevenLabsSoundEffectsProviderOptions +export type ElevenLabsAudioProviderOptions = + | ElevenLabsMusicProviderOptions + | ElevenLabsSoundEffectsProviderOptions🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts` around lines 85 - 88, The union type ElevenLabsAudioProviderOptions includes a redundant intersection branch; replace the current declaration that includes (ElevenLabsMusicProviderOptions & ElevenLabsSoundEffectsProviderOptions) with the simpler union ElevenLabsMusicProviderOptions | ElevenLabsSoundEffectsProviderOptions so the type is not misleading about combined music+SFX fields (refer to ElevenLabsAudioProviderOptions, ElevenLabsMusicProviderOptions, and ElevenLabsSoundEffectsProviderOptions).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts`:
- Around line 151-173: In runMusic, when music.compositionPlan is present the
options.duration is ignored by client.music.compose but still passed into
finalize causing result.audio.duration to be incorrect; update runMusic to
detect music.compositionPlan and either (a) compute the true duration by summing
each section.durationMs from music.compositionPlan (convert ms to seconds) and
pass that computed value to finalize, or (b) omit passing duration to finalize
when a compositionPlan exists so result.audio.duration is not set; modify the
logic around the call sites of client.music.compose and finalize (referencing
runMusic, client.music.compose, finalize, options.duration, and
music.compositionPlan) to implement one of these two behaviors so
result.audio.duration reflects the actual composition length or is left out.
---
Nitpick comments:
In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts`:
- Around line 219-221: The class has an unused override of generateId() in
ElevenLabsAudioAdapter because finalize currently calls the imported utility
generateId(this.name) directly; update finalize to call this.generateId() so the
override is honored (or alternatively remove the override if you prefer not to
support customization). Locate the finalize method and replace the direct call
to the imported generateId(...) with a call to this.generateId(), ensuring the
class-level override will be invoked for subclasses.
- Around line 85-88: The union type ElevenLabsAudioProviderOptions includes a
redundant intersection branch; replace the current declaration that includes
(ElevenLabsMusicProviderOptions & ElevenLabsSoundEffectsProviderOptions) with
the simpler union ElevenLabsMusicProviderOptions |
ElevenLabsSoundEffectsProviderOptions so the type is not misleading about
combined music+SFX fields (refer to ElevenLabsAudioProviderOptions,
ElevenLabsMusicProviderOptions, and ElevenLabsSoundEffectsProviderOptions).
In `@packages/typescript/ai-elevenlabs/src/adapters/speech.ts`:
- Around line 171-173: The protected override generateId(): string in the Speech
adapter is unused because generateSpeech() calls the imported utility
generateId(this.name) directly, so remove the redundant override to rely on
BaseTTSAdapter's generateId() implementation; delete the protected override
generateId() method from the class and ensure there are no other references to
that override so generateSpeech and other methods use the base-class behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d1952ccc-f9aa-4d5b-b24e-73b514b5a8e6
📒 Files selected for processing (3)
packages/typescript/ai-elevenlabs/src/adapters/audio.tspackages/typescript/ai-elevenlabs/src/adapters/speech.tspackages/typescript/ai-elevenlabs/src/model-meta.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/typescript/ai-elevenlabs/src/model-meta.ts
| const stream = await this.client.music.compose({ | ||
| modelId, | ||
| ...(options.prompt && !music.compositionPlan | ||
| ? { prompt: options.prompt } | ||
| : {}), | ||
| ...(music.compositionPlan | ||
| ? { compositionPlan: toMusicPrompt(music.compositionPlan) } | ||
| : {}), | ||
| ...(options.duration != null && !music.compositionPlan | ||
| ? { musicLengthMs: Math.round(options.duration * 1000) } | ||
| : {}), | ||
| ...(outputFormat ? { outputFormat } : {}), | ||
| ...(music.seed != null ? { seed: music.seed } : {}), | ||
| ...(music.forceInstrumental != null | ||
| ? { forceInstrumental: music.forceInstrumental } | ||
| : {}), | ||
| ...(music.respectSectionsDurations != null | ||
| ? { respectSectionsDurations: music.respectSectionsDurations } | ||
| : {}), | ||
| }) | ||
|
|
||
| return this.finalize(stream, outputFormat, options.duration) | ||
| } |
There was a problem hiding this comment.
audio.duration can misrepresent the generated track when compositionPlan is used.
In runMusic, options.duration is intentionally not forwarded to client.music.compose when a compositionPlan is supplied (lines 159-161) — the real length is derived from the sum of section.durationMs values. However, line 172 still unconditionally passes options.duration into finalize, which attaches it to result.audio.duration. If a caller supplies both a compositionPlan and (ignored) duration: 15, the response will claim a 15 s track while the actual audio may be much longer/shorter. Consider suppressing duration in the composition-plan path (or deriving it from the plan) so the field either reflects reality or is omitted.
🛠️ Suggested change
- return this.finalize(stream, outputFormat, options.duration)
+ const resolvedDuration = music.compositionPlan
+ ? undefined
+ : options.duration
+ return this.finalize(stream, outputFormat, resolvedDuration)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const stream = await this.client.music.compose({ | |
| modelId, | |
| ...(options.prompt && !music.compositionPlan | |
| ? { prompt: options.prompt } | |
| : {}), | |
| ...(music.compositionPlan | |
| ? { compositionPlan: toMusicPrompt(music.compositionPlan) } | |
| : {}), | |
| ...(options.duration != null && !music.compositionPlan | |
| ? { musicLengthMs: Math.round(options.duration * 1000) } | |
| : {}), | |
| ...(outputFormat ? { outputFormat } : {}), | |
| ...(music.seed != null ? { seed: music.seed } : {}), | |
| ...(music.forceInstrumental != null | |
| ? { forceInstrumental: music.forceInstrumental } | |
| : {}), | |
| ...(music.respectSectionsDurations != null | |
| ? { respectSectionsDurations: music.respectSectionsDurations } | |
| : {}), | |
| }) | |
| return this.finalize(stream, outputFormat, options.duration) | |
| } | |
| const stream = await this.client.music.compose({ | |
| modelId, | |
| ...(options.prompt && !music.compositionPlan | |
| ? { prompt: options.prompt } | |
| : {}), | |
| ...(music.compositionPlan | |
| ? { compositionPlan: toMusicPrompt(music.compositionPlan) } | |
| : {}), | |
| ...(options.duration != null && !music.compositionPlan | |
| ? { musicLengthMs: Math.round(options.duration * 1000) } | |
| : {}), | |
| ...(outputFormat ? { outputFormat } : {}), | |
| ...(music.seed != null ? { seed: music.seed } : {}), | |
| ...(music.forceInstrumental != null | |
| ? { forceInstrumental: music.forceInstrumental } | |
| : {}), | |
| ...(music.respectSectionsDurations != null | |
| ? { respectSectionsDurations: music.respectSectionsDurations } | |
| : {}), | |
| }) | |
| const resolvedDuration = music.compositionPlan | |
| ? undefined | |
| : options.duration | |
| return this.finalize(stream, outputFormat, resolvedDuration) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts` around lines 151 -
173, In runMusic, when music.compositionPlan is present the options.duration is
ignored by client.music.compose but still passed into finalize causing
result.audio.duration to be incorrect; update runMusic to detect
music.compositionPlan and either (a) compute the true duration by summing each
section.durationMs from music.compositionPlan (convert ms to seconds) and pass
that computed value to finalize, or (b) omit passing duration to finalize when a
compositionPlan exists so result.audio.duration is not set; modify the logic
around the call sites of client.music.compose and finalize (referencing
runMusic, client.music.compose, finalize, options.duration, and
music.compositionPlan) to implement one of these two behaviors so
result.audio.duration reflects the actual composition length or is left out.
…realtime example Mirror the `ELEVENLABS_API_KEY` pattern for agent ids: add `getElevenLabsAgentIdFromEnv()` and make `agentId` optional on `ElevenLabsRealtimeTokenOptions`. `elevenlabsRealtimeToken()` now resolves `options.agentId ?? ELEVENLABS_AGENT_ID` at call time. Simplify the ts-react-chat example: drop the manual `process.env` dance and the Agent ID text input from the realtime page — the adapter handles the env fallback now. Replace the input with a Language selector that threads `overrides.language` through to the session, so users can switch off the agent's dashboard default (common need when the agent is configured for one language but a caller wants another). Also broaden `.env.example` in ts-react-chat to cover every provider the example actually reads (Anthropic, Gemini, xAI, Groq, OpenRouter, fal) — previously only OpenAI and ElevenLabs were listed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #485.
Summary
@tanstack/ai-elevenlabs(previously realtime-only) with three tree-shakeable REST adapters built on the official@elevenlabs/elevenlabs-jsv2.44 SDK:elevenlabsSpeech()— TTS oneleven_v3,eleven_multilingual_v2, flash/turbo variants. Voice resolves viaoptions.voiceormodelOptions.voiceId.elevenlabsAudio()— music (music_v1, with structured composition plans) and SFX (eleven_text_to_sound_v2/v1) in a single adapter that dispatches by model id.elevenlabsTranscription()— Scribe v1/v2 speech-to-text with diarization, keyterm biasing, PII redaction, and word-level timestamps →TranscriptionSegment/TranscriptionWord.@11labs/clientonto the renamed@elevenlabs/client(v1.3.1). Token adapter rewritten to useclient.conversationalAi.conversations.getSignedUrlvia the server SDK.ts-react-chatexample catalogs (SPEECH_PROVIDERS,AUDIO_PROVIDERSmusic + SFX,TRANSCRIPTION_PROVIDERS), the server adapter factories, and the matching zod enum schemas.elevenlabsto the e2eProviderunion + tts/transcription support matrix;createTTSAdapter/createTranscriptionAdapterfactories point the SDK at aimock viabaseUrl.Scope notes
Per the comment on #485 the adapter set was simplified to
generateAudio+generateSpeech+generateTranscription— music and SFX collapse into oneelevenlabsAudio(model)that routes by model rather than separateelevenlabsMusic()/elevenlabsSoundEffects(). Transcription is included (Scribe).If aimock doesn't yet cover
api.elevenlabs.ioroutes, the e2e tts/transcription tests forelevenlabswill need a companion stub PR there — the matrix wiring is already in place so those tests will light up as soon as the mocks exist.Test plan
pnpm --filter @tanstack/ai-elevenlabs test:lib— 24 unit tests pass (speech + audio music/SFX branches + transcription with data-URL / ArrayBuffer / diarization + realtime mock updated to@elevenlabs/client)pnpm --filter @tanstack/ai-elevenlabs test:typespnpm --filter @tanstack/ai-elevenlabs test:eslint(only a pre-existing realtime warning)pnpm --filter @tanstack/ai-elevenlabs test:build(publint --strict)pnpm --filter @tanstack/ai-elevenlabs buildpnpm --filter @tanstack/ai-e2e test:e2e -- --grep "elevenlabs -- tts"(depends on aimock coverage)pnpm --filter @tanstack/ai-e2e test:e2e -- --grep "elevenlabs -- transcription"(depends on aimock coverage)ELEVENLABS_API_KEY:generateSpeech(eleven_v3),generateAudio(music_v115s +eleven_text_to_sound_v25s),generateTranscription(short wav)pnpm --filter ts-react-chat dev— ElevenLabs tabs on/generations/speech,/generations/audio,/generations/transcription🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Chores
Tests