Skip to content

feat(ai-elevenlabs): add speech/audio/transcription adapters via official SDK#504

Open
tombeckenham wants to merge 7 commits intoTanStack:mainfrom
tombeckenham:485-feat-ai-elevenlabs-tts-music-sfx-transcription-adapters-via-official-sdk
Open

feat(ai-elevenlabs): add speech/audio/transcription adapters via official SDK#504
tombeckenham wants to merge 7 commits intoTanStack:mainfrom
tombeckenham:485-feat-ai-elevenlabs-tts-music-sfx-transcription-adapters-via-official-sdk

Conversation

@tombeckenham
Copy link
Copy Markdown
Contributor

@tombeckenham tombeckenham commented Apr 24, 2026

Closes #485.

Summary

  • Extends @tanstack/ai-elevenlabs (previously realtime-only) with three tree-shakeable REST adapters built on the official @elevenlabs/elevenlabs-js v2.44 SDK:
    • elevenlabsSpeech() — TTS on eleven_v3, eleven_multilingual_v2, flash/turbo variants. Voice resolves via options.voice or modelOptions.voiceId.
    • elevenlabsAudio() — music (music_v1, with structured composition plans) and SFX (eleven_text_to_sound_v2/v1) in a single adapter that dispatches by model id.
    • elevenlabsTranscription() — Scribe v1/v2 speech-to-text with diarization, keyterm biasing, PII redaction, and word-level timestamps → TranscriptionSegment/TranscriptionWord.
  • Migrates the existing realtime adapter off the deprecated @11labs/client onto the renamed @elevenlabs/client (v1.3.1). Token adapter rewritten to use client.conversationalAi.conversations.getSignedUrl via the server SDK.
  • Wires ElevenLabs into the ts-react-chat example catalogs (SPEECH_PROVIDERS, AUDIO_PROVIDERS music + SFX, TRANSCRIPTION_PROVIDERS), the server adapter factories, and the matching zod enum schemas.
  • Adds elevenlabs to the e2e Provider union + tts/transcription support matrix; createTTSAdapter / createTranscriptionAdapter factories point the SDK at aimock via baseUrl.

Scope notes

Per the comment on #485 the adapter set was simplified to generateAudio + generateSpeech + generateTranscription — music and SFX collapse into one elevenlabsAudio(model) that routes by model rather than separate elevenlabsMusic() / elevenlabsSoundEffects(). Transcription is included (Scribe).

If aimock doesn't yet cover api.elevenlabs.io routes, the e2e tts/transcription tests for elevenlabs will need a companion stub PR there — the matrix wiring is already in place so those tests will light up as soon as the mocks exist.

Test plan

  • pnpm --filter @tanstack/ai-elevenlabs test:lib — 24 unit tests pass (speech + audio music/SFX branches + transcription with data-URL / ArrayBuffer / diarization + realtime mock updated to @elevenlabs/client)
  • pnpm --filter @tanstack/ai-elevenlabs test:types
  • pnpm --filter @tanstack/ai-elevenlabs test:eslint (only a pre-existing realtime warning)
  • pnpm --filter @tanstack/ai-elevenlabs test:build (publint --strict)
  • pnpm --filter @tanstack/ai-elevenlabs build
  • pnpm --filter @tanstack/ai-e2e test:e2e -- --grep "elevenlabs -- tts" (depends on aimock coverage)
  • pnpm --filter @tanstack/ai-e2e test:e2e -- --grep "elevenlabs -- transcription" (depends on aimock coverage)
  • Live smoke with ELEVENLABS_API_KEY: generateSpeech (eleven_v3), generateAudio (music_v1 15s + eleven_text_to_sound_v2 5s), generateTranscription (short wav)
  • pnpm --filter ts-react-chat dev — ElevenLabs tabs on /generations/speech, /generations/audio, /generations/transcription

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added ElevenLabs TTS, music & SFX audio, and transcription adapters (model lists, diarization, keyterm biasing, PII redaction, word-level timestamps).
  • Chores

    • Switched to the renamed ElevenLabs SDK and adjusted SSR bundling behavior.
    • Updated example app env templates and added a dev script for the chat example.
    • Realtime config now accepts an optional language override (UI updated from Agent ID).
  • Tests

    • Added unit and integration tests covering ElevenLabs adapters.

…cial SDK (TanStack#485)

Extends @tanstack/ai-elevenlabs with three tree-shakeable REST adapters built
on the official @elevenlabs/elevenlabs-js SDK — elevenlabsSpeech (TTS),
elevenlabsAudio (music + SFX dispatched by model), and elevenlabsTranscription
(Scribe v1/v2). Migrates the realtime adapter off the deprecated @11labs/client
onto the renamed @elevenlabs/client. Wires ElevenLabs into the ts-react-chat
example provider catalogs and the e2e tts/transcription support matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1445edbb-9dbd-403d-a999-9f7390fe45b8

📥 Commits

Reviewing files that changed from the base of the PR and between 3b588ed and 28ccc0d.

📒 Files selected for processing (6)
  • examples/ts-react-chat/.env.example
  • examples/ts-react-chat/src/lib/use-realtime.ts
  • examples/ts-react-chat/src/routes/realtime.tsx
  • packages/typescript/ai-elevenlabs/src/realtime/token.ts
  • packages/typescript/ai-elevenlabs/src/realtime/types.ts
  • packages/typescript/ai-elevenlabs/src/utils/client.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/typescript/ai-elevenlabs/src/realtime/token.ts
  • packages/typescript/ai-elevenlabs/src/utils/client.ts

📝 Walkthrough

Walkthrough

Adds three tree-shakeable REST adapters (TTS, music/SFX audio, transcription) implemented against the official ElevenLabs SDK, migrates realtime code from @11labs/client to @elevenlabs/client, and integrates ElevenLabs across examples, tests, and E2E tooling.

Changes

Cohort / File(s) Summary
Release & Package Metadata
\.changeset/elevenlabs-rest-adapters.md, packages/typescript/ai-elevenlabs/package.json
New changeset; package metadata updated and dependency replaced from @11labs/client to @elevenlabs/client / @elevenlabs/elevenlabs-js.
Core Adapters
packages/typescript/ai-elevenlabs/src/adapters/speech.ts, packages/typescript/ai-elevenlabs/src/adapters/audio.ts, packages/typescript/ai-elevenlabs/src/adapters/transcription.ts
Adds REST adapters: TTS (elevenlabsSpeech), audio (music & SFX elevenlabsAudio), transcription (elevenlabsTranscription) with factories, option types, stream handling, and SDK request/response mapping.
Model Metadata & Utilities
packages/typescript/ai-elevenlabs/src/model-meta.ts, packages/typescript/ai-elevenlabs/src/utils/client.ts, packages/typescript/ai-elevenlabs/src/utils/index.ts
Introduces model lists, type guards, client creation helpers, env accessors, ID/stream/base64 helpers, dataUrl->Blob, and output format parser.
Public Exports
packages/typescript/ai-elevenlabs/src/index.ts
Exports new adapters, factories, provider option types, model metadata, and utility helpers from package entry.
Realtime Migration
packages/typescript/ai-elevenlabs/src/realtime/adapter.ts, packages/typescript/ai-elevenlabs/src/realtime/token.ts, packages/typescript/ai-elevenlabs/src/realtime/types.ts
Switched realtime imports to @elevenlabs/client; token flow refactored to use SDK client; token options made optional and signature defaulted.
Example App Integration
examples/ts-react-chat/src/lib/audio-providers.ts, .../server-audio-adapters.ts, .../server-fns.ts, examples/ts-react-chat/src/routes/api.generate.audio.ts, examples/ts-react-chat/src/routes/api.generate.speech.ts, examples/ts-react-chat/src/routes/api.transcribe.ts, examples/ts-react-chat/vite.config.ts, examples/ts-react-chat/src/lib/use-realtime.ts, examples/ts-react-chat/src/routes/realtime.tsx, examples/ts-react-chat/.env.example
Adds ElevenLabs provider IDs/configs, wires adapter branches for speech/audio/transcription, extends Zod validation enums, changes Vite SSR externals for the SDK, updates realtime API to use language instead of agentId, and updates env template variables.
Server Adapters / API Routes
examples/ts-react-chat/src/lib/server-audio-adapters.ts, examples/ts-react-chat/src/lib/server-fns.ts, examples/ts-react-chat/src/routes/api.*
Wired ElevenLabs adapter factories into server-side adapter selection and broadened request-body provider enums.
Vite / SSR Configs
examples/ts-react-chat/vite.config.ts, testing/e2e/vite.config.ts
Treats @elevenlabs/elevenlabs-js as an SSR external to avoid bundling the SDK.
Tests
packages/typescript/ai-elevenlabs/tests/*.test.ts
Added Vitest suites for speech, audio, transcription adapters; updated realtime tests to mock @elevenlabs/client.
E2E & Testing Infra
testing/e2e/src/lib/media-providers.ts, testing/e2e/src/lib/providers.ts, testing/e2e/src/lib/types.ts, testing/e2e/tests/test-matrix.ts, testing/e2e/package.json
Registered elevenlabs provider in types and factories, added adapter instantiation for TTS/transcription (aimock-based), added workspace dependency, and documented exclusion from live feature matrix.
Example Dev Convenience
package.json
Added root script dev:chat to run the ts-react-chat example.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(210,235,255,0.5)
  participant Client
  end
  rect rgba(220,255,220,0.5)
  participant Server
  participant Adapter
  end
  rect rgba(255,245,210,0.5)
  participant ElevenLabsSDK
  participant Storage
  end

  Client->>Server: POST /api.generate.speech (text, provider=elevenlabs)
  Server->>Adapter: buildSpeechAdapter(provider, modelOptions)
  Adapter->>ElevenLabsSDK: textToSpeech.convert({ modelId, voiceId, outputFormat, settings })
  ElevenLabsSDK-->>Adapter: audio ReadableStream
  Adapter->>Storage: readStreamToArrayBuffer -> arrayBufferToBase64
  Adapter-->>Server: TTSResult { id, audio: { b64Json, contentType, format } }
  Server-->>Client: return or stream audio payload
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • AlemTuzlak

Poem

🐰 I hopped through code and stitched a song,
Voices, beats, and transcripts—bright and strong.
SDK burrows gave me keys to play,
New adapters dance and lead the way. 🎧✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.90% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding speech, audio, and transcription adapters via the official SDK.
Description check ✅ Passed The description provides comprehensive details about the changes, including specific adapters added, migration from deprecated SDK, and integration with examples/testing. Follows template with clear sections and checklist items.
Linked Issues check ✅ Passed The PR implements all core requirements from issue #485: three REST adapters (speech, audio with unified music/SFX, transcription), migration from @11labs/client to @elevenlabs/client, and integration with examples and e2e testing.
Out of Scope Changes check ✅ Passed All changes are within scope: adapter implementations, SDK migration, realtime token refactor, example/e2e integration, and config updates to support ElevenLabs providers. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nx-cloud
Copy link
Copy Markdown

nx-cloud Bot commented Apr 24, 2026

View your CI Pipeline Execution ↗ for commit 28ccc0d

Command Status Duration Result
nx run-many --targets=build --exclude=examples/** ✅ Succeeded 1m 43s View ↗

☁️ Nx Cloud last updated this comment at 2026-04-24 07:24:43 UTC

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 24, 2026

Open in StackBlitz

@tanstack/ai

npm i https://pkg.pr.new/@tanstack/ai@504

@tanstack/ai-anthropic

npm i https://pkg.pr.new/@tanstack/ai-anthropic@504

@tanstack/ai-client

npm i https://pkg.pr.new/@tanstack/ai-client@504

@tanstack/ai-code-mode

npm i https://pkg.pr.new/@tanstack/ai-code-mode@504

@tanstack/ai-code-mode-skills

npm i https://pkg.pr.new/@tanstack/ai-code-mode-skills@504

@tanstack/ai-devtools-core

npm i https://pkg.pr.new/@tanstack/ai-devtools-core@504

@tanstack/ai-elevenlabs

npm i https://pkg.pr.new/@tanstack/ai-elevenlabs@504

@tanstack/ai-event-client

npm i https://pkg.pr.new/@tanstack/ai-event-client@504

@tanstack/ai-fal

npm i https://pkg.pr.new/@tanstack/ai-fal@504

@tanstack/ai-gemini

npm i https://pkg.pr.new/@tanstack/ai-gemini@504

@tanstack/ai-grok

npm i https://pkg.pr.new/@tanstack/ai-grok@504

@tanstack/ai-groq

npm i https://pkg.pr.new/@tanstack/ai-groq@504

@tanstack/ai-isolate-cloudflare

npm i https://pkg.pr.new/@tanstack/ai-isolate-cloudflare@504

@tanstack/ai-isolate-node

npm i https://pkg.pr.new/@tanstack/ai-isolate-node@504

@tanstack/ai-isolate-quickjs

npm i https://pkg.pr.new/@tanstack/ai-isolate-quickjs@504

@tanstack/ai-ollama

npm i https://pkg.pr.new/@tanstack/ai-ollama@504

@tanstack/ai-openai

npm i https://pkg.pr.new/@tanstack/ai-openai@504

@tanstack/ai-openrouter

npm i https://pkg.pr.new/@tanstack/ai-openrouter@504

@tanstack/ai-preact

npm i https://pkg.pr.new/@tanstack/ai-preact@504

@tanstack/ai-react

npm i https://pkg.pr.new/@tanstack/ai-react@504

@tanstack/ai-react-ui

npm i https://pkg.pr.new/@tanstack/ai-react-ui@504

@tanstack/ai-solid

npm i https://pkg.pr.new/@tanstack/ai-solid@504

@tanstack/ai-solid-ui

npm i https://pkg.pr.new/@tanstack/ai-solid-ui@504

@tanstack/ai-svelte

npm i https://pkg.pr.new/@tanstack/ai-svelte@504

@tanstack/ai-vue

npm i https://pkg.pr.new/@tanstack/ai-vue@504

@tanstack/ai-vue-ui

npm i https://pkg.pr.new/@tanstack/ai-vue-ui@504

@tanstack/preact-ai-devtools

npm i https://pkg.pr.new/@tanstack/preact-ai-devtools@504

@tanstack/react-ai-devtools

npm i https://pkg.pr.new/@tanstack/react-ai-devtools@504

@tanstack/solid-ai-devtools

npm i https://pkg.pr.new/@tanstack/solid-ai-devtools@504

commit: 28ccc0d

The SDK defines a top-level `function getHeader(…)` in
`core/fetcher/getHeader.js`, which collides with h3's auto-imported
`getHeader` once vite/nitro inline both into the same server chunk —
esbuild then rejects the duplicate symbol and the e2e build fails with
`The symbol "getHeader" has already been declared`.

Marking the SDK as a vite SSR + nitro external keeps it resolved at
runtime on the server side, which is what we want anyway for a
server-only REST client.

Also adds a local `pnpm dev:chat` convenience script to run the
ts-react-chat example without remembering the filter flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tombeckenham tombeckenham force-pushed the 485-feat-ai-elevenlabs-tts-music-sfx-transcription-adapters-via-official-sdk branch from 7d75161 to de7c302 Compare April 24, 2026 05:12
…hat and descoping elevenlabs e2e

Two CI failures on PR TanStack#504:

1. `ts-react-chat:build` hit the same `getHeader` SSR collision as the
   e2e app — now that the example wires ElevenLabs into the server-side
   audio-adapter factories, its SSR bundle faces the same SDK/h3 symbol
   clash. Same fix (`ssr.external` + nitro `externals.external`) applied
   to `examples/ts-react-chat/vite.config.ts`.

2. `elevenlabs -- tts` and `elevenlabs -- transcription` e2e tests
   failed because aimock doesn't yet stub `api.elevenlabs.io` routes —
   the real SDK HTTP calls had no mock target and errored out. Removed
   `elevenlabs` from the `tts` + `transcription` support matrix sets in
   `testing/e2e/{tests/test-matrix.ts,src/lib/feature-support.ts}` for
   now; the factories stay in `media-providers.ts` so they light up
   automatically once aimock ships coverage. Tracked as part of the
   nitro/aimock follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tombeckenham tombeckenham marked this pull request as ready for review April 24, 2026 05:40
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
packages/typescript/ai-elevenlabs/src/utils/client.ts (3)

131-151: dataUrlToBlob does not tolerate whitespace in base64 payloads and silently misroutes invalid data URLs.

Two small edge-case notes:

  1. atob throws on any whitespace / line-wrapping in the base64 payload (common for long data URLs hand-pasted by users or produced by some encoders that insert \n). Stripping whitespace before atob avoids a hard throw from inside the adapter.
  2. When value looks like a data URL but commaIndex === -1, the function returns undefined, which the caller then treats as an https URL. A malformed data: URL ending up as an HTTP request is a confusing failure mode — consider throwing a TypeError('Invalid data URL') instead so the error is localized.
🛠️ Suggested tightening
-  if (!value.startsWith('data:')) return undefined
-  const commaIndex = value.indexOf(',')
-  if (commaIndex === -1) return undefined
+  if (!value.startsWith('data:')) return undefined
+  const commaIndex = value.indexOf(',')
+  if (commaIndex === -1) {
+    throw new TypeError('Invalid data URL: missing comma separator')
+  }
@@
-  if (isBase64) {
-    const binary = atob(payload)
+  if (isBase64) {
+    const binary = atob(payload.replace(/\s+/g, ''))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-elevenlabs/src/utils/client.ts` around lines 131 -
151, The dataUrlToBlob function should tolerate whitespace in base64 payloads
and fail fast on malformed data: URLs: when value startsWith('data:') but
commaIndex === -1, throw a TypeError('Invalid data URL') instead of returning
undefined; and when isBase64 is true, strip whitespace (e.g., remove /\s+/g)
from the payload before calling atob so atob does not throw on line-wrapped or
spaced base64. Update dataUrlToBlob to apply these two changes while preserving
existing mimeType handling and non-base64 decode path.

113-123: Nit: unnecessary slice in readStreamToArrayBuffer.

merged is a freshly allocated Uint8Array(total), so merged.byteOffset is 0 and merged.buffer.byteLength === total. The slice(byteOffset, byteOffset + byteLength) copies the entire buffer again for no benefit. You can just return merged.buffer.

♻️ Proposed simplification
-  return merged.buffer.slice(
-    merged.byteOffset,
-    merged.byteOffset + merged.byteLength,
-  )
+  return merged.buffer
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-elevenlabs/src/utils/client.ts` around lines 113 -
123, The code in readStreamToArrayBuffer unnecessarily calls slice on the newly
allocated Uint8Array `merged` (whose byteOffset is 0 and whose buffer length
equals total), causing an extra copy; simply return `merged.buffer` instead of
`merged.buffer.slice(merged.byteOffset, merged.byteOffset + merged.byteLength)`
to avoid the redundant allocation/copy and preserve the same ArrayBuffer result.

27-36: Window env lookup is dead code in a server-side SDK—consider removing for clarity or swapping precedence.

The test environment is correctly configured as 'node' (not jsdom or happy-dom), so the transitive dependency concern is mitigated. However, the code still checks globalThis.window?.env before process.env — which would be unsafe if window were shimmed, though that doesn't occur in practice here.

Since the SDK is server-side only (adapters externalized from SSR bundle), window.env lookup is dead code and worth removing. If you want to preserve it for future client-side realtime token fetches, swap the precedence to process.env first, since the SDK itself runs server-side.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-elevenlabs/src/utils/client.ts` around lines 27 - 36,
The getEnvironment function currently checks globalThis.window?.env before
process.env; remove the dead client-side lookup and simplify getEnvironment (and
its EnvObject usage) to return process.env when available, otherwise
undefined—i.e., eliminate the globalThis/window branch in getEnvironment so the
server-side SDK always prefers process.env (or if you want to preserve
client-side behavior instead swap precedence, ensure process.env is checked
first and only fall back to window.env).
packages/typescript/ai-elevenlabs/src/adapters/audio.ts (1)

83-86: Union type has a redundant branch.

(A & B) | A | B is structurally equivalent to A | B when both A and B consist solely of optional members (every object satisfies either), so the first branch doesn't add inference value but complicates the public type signature. Dropping it simplifies the exported type without behavior change.

♻️ Simplification
 export type ElevenLabsAudioProviderOptions =
-  | (ElevenLabsMusicProviderOptions & ElevenLabsSoundEffectsProviderOptions)
   | ElevenLabsMusicProviderOptions
   | ElevenLabsSoundEffectsProviderOptions
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts` around lines 83 -
86, The exported union type ElevenLabsAudioProviderOptions is written as
(ElevenLabsMusicProviderOptions & ElevenLabsSoundEffectsProviderOptions) |
ElevenLabsMusicProviderOptions | ElevenLabsSoundEffectsProviderOptions which is
redundant; replace it with the simplified union ElevenLabsMusicProviderOptions |
ElevenLabsSoundEffectsProviderOptions by removing the intersecting branch to
clean up the public signature while preserving behavior (refer to the
ElevenLabsAudioProviderOptions, ElevenLabsMusicProviderOptions, and
ElevenLabsSoundEffectsProviderOptions type names to locate and update the
declaration).
packages/typescript/ai-elevenlabs/src/adapters/transcription.ts (1)

277-310: Comment overstates the grouping behavior.

The docstring says "If no speaker is ever set, we still emit one segment per sentence-ish grouping", but the code only splits on speakerId change — when no speaker IDs are present, all timed words collapse into a single segment (no sentence heuristic is applied). Consider updating the comment to match the actual behavior to avoid future confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-elevenlabs/src/adapters/transcription.ts` around lines
277 - 310, The comment above the segmentation loop misstates behavior: the loop
over timedWords only splits segments on speakerId changes (using variables
timedWords, current, segments, TranscriptionSegment and the w.speakerId check),
so when no speakerId is present all words collapse into a single segment; update
the docstring/comment to accurately state that segmentation is driven solely by
speakerId changes (or alternatively implement a sentence/pausing heuristic if
you want actual sentence-ish grouping) so future readers aren’t misled.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts`:
- Around line 141-170: runMusic currently hardcodes modelId:'music_v1' which
ignores the adapter's selected model; update runMusic to forward this.model into
the compose call (use modelId: this.model) so it respects ElevenLabsMusicModel
and matches runSoundEffects behavior, ensuring future/extended music model
strings are preserved when calling this.client.music.compose.

In `@packages/typescript/ai-elevenlabs/src/adapters/speech.ts`:
- Around line 205-222: The function inferOutputFormatFromResponseFormat
currently silently maps unsupported requested formats ('wav'/'aac'/'flac') to
the MP3 fallback ('mp3_44100_128'); update it to accept an optional logging
interface (e.g., add a parameter options or logger) and emit a warning when
falling back so callers see the divergence (reference
inferOutputFormatFromResponseFormat and the default branch); ensure the warning
includes the originally requested format and the actual returned ElevenLabs
format, and keep the existing fallback return value ('mp3_44100_128') for
compatibility.

In `@packages/typescript/ai-elevenlabs/src/adapters/transcription.ts`:
- Around line 236-249: In normalizeAudioInput, the string branch currently
treats any non-data URL as a cloudStorageUrl; change the typeof audio ===
'string' handling to validate the string schema (e.g., accept /^https?:\/\// and
any allowed cloud schemes like s3://, gs://, az://) before returning { kind:
'url', value: audio }, and otherwise throw a clear, descriptive error (include
the incoming value) so malformed URLs/local paths/raw base64 are rejected
locally; keep using dataUrlToBlob for data: URLs and leave the
ArrayBuffer/Blob/File handling unchanged.

In `@packages/typescript/ai-elevenlabs/src/model-meta.ts`:
- Around line 53-59: The predicate for music models is too strict: update
isElevenLabsMusicModel to mirror isElevenLabsSoundEffectsModel by using a prefix
or pattern match (e.g., startsWith('music_')) so future music model IDs like
'music_v2' are recognized; adjust the related usage in adapters/audio.ts where
runMusic currently hardcodes modelId: 'music_v1' to pass through the actual
model string instead, ensuring consistent handling between
isElevenLabsMusicModel, isElevenLabsSoundEffectsModel, and runMusic.

---

Nitpick comments:
In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts`:
- Around line 83-86: The exported union type ElevenLabsAudioProviderOptions is
written as (ElevenLabsMusicProviderOptions &
ElevenLabsSoundEffectsProviderOptions) | ElevenLabsMusicProviderOptions |
ElevenLabsSoundEffectsProviderOptions which is redundant; replace it with the
simplified union ElevenLabsMusicProviderOptions |
ElevenLabsSoundEffectsProviderOptions by removing the intersecting branch to
clean up the public signature while preserving behavior (refer to the
ElevenLabsAudioProviderOptions, ElevenLabsMusicProviderOptions, and
ElevenLabsSoundEffectsProviderOptions type names to locate and update the
declaration).

In `@packages/typescript/ai-elevenlabs/src/adapters/transcription.ts`:
- Around line 277-310: The comment above the segmentation loop misstates
behavior: the loop over timedWords only splits segments on speakerId changes
(using variables timedWords, current, segments, TranscriptionSegment and the
w.speakerId check), so when no speakerId is present all words collapse into a
single segment; update the docstring/comment to accurately state that
segmentation is driven solely by speakerId changes (or alternatively implement a
sentence/pausing heuristic if you want actual sentence-ish grouping) so future
readers aren’t misled.

In `@packages/typescript/ai-elevenlabs/src/utils/client.ts`:
- Around line 131-151: The dataUrlToBlob function should tolerate whitespace in
base64 payloads and fail fast on malformed data: URLs: when value
startsWith('data:') but commaIndex === -1, throw a TypeError('Invalid data URL')
instead of returning undefined; and when isBase64 is true, strip whitespace
(e.g., remove /\s+/g) from the payload before calling atob so atob does not
throw on line-wrapped or spaced base64. Update dataUrlToBlob to apply these two
changes while preserving existing mimeType handling and non-base64 decode path.
- Around line 113-123: The code in readStreamToArrayBuffer unnecessarily calls
slice on the newly allocated Uint8Array `merged` (whose byteOffset is 0 and
whose buffer length equals total), causing an extra copy; simply return
`merged.buffer` instead of `merged.buffer.slice(merged.byteOffset,
merged.byteOffset + merged.byteLength)` to avoid the redundant allocation/copy
and preserve the same ArrayBuffer result.
- Around line 27-36: The getEnvironment function currently checks
globalThis.window?.env before process.env; remove the dead client-side lookup
and simplify getEnvironment (and its EnvObject usage) to return process.env when
available, otherwise undefined—i.e., eliminate the globalThis/window branch in
getEnvironment so the server-side SDK always prefers process.env (or if you want
to preserve client-side behavior instead swap precedence, ensure process.env is
checked first and only fall back to window.env).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dd3d048a-acbf-4832-96a4-cdd8f8e45920

📥 Commits

Reviewing files that changed from the base of the PR and between dc71c72 and c3b18f9.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (30)
  • .changeset/elevenlabs-rest-adapters.md
  • examples/ts-react-chat/src/lib/audio-providers.ts
  • examples/ts-react-chat/src/lib/server-audio-adapters.ts
  • examples/ts-react-chat/src/lib/server-fns.ts
  • examples/ts-react-chat/src/routes/api.generate.audio.ts
  • examples/ts-react-chat/src/routes/api.generate.speech.ts
  • examples/ts-react-chat/src/routes/api.transcribe.ts
  • examples/ts-react-chat/vite.config.ts
  • package.json
  • packages/typescript/ai-elevenlabs/package.json
  • packages/typescript/ai-elevenlabs/src/adapters/audio.ts
  • packages/typescript/ai-elevenlabs/src/adapters/speech.ts
  • packages/typescript/ai-elevenlabs/src/adapters/transcription.ts
  • packages/typescript/ai-elevenlabs/src/index.ts
  • packages/typescript/ai-elevenlabs/src/model-meta.ts
  • packages/typescript/ai-elevenlabs/src/realtime/adapter.ts
  • packages/typescript/ai-elevenlabs/src/realtime/token.ts
  • packages/typescript/ai-elevenlabs/src/utils/client.ts
  • packages/typescript/ai-elevenlabs/src/utils/index.ts
  • packages/typescript/ai-elevenlabs/tests/audio-adapter.test.ts
  • packages/typescript/ai-elevenlabs/tests/realtime-adapter.test.ts
  • packages/typescript/ai-elevenlabs/tests/speech-adapter.test.ts
  • packages/typescript/ai-elevenlabs/tests/transcription-adapter.test.ts
  • testing/e2e/package.json
  • testing/e2e/src/lib/feature-support.ts
  • testing/e2e/src/lib/media-providers.ts
  • testing/e2e/src/lib/providers.ts
  • testing/e2e/src/lib/types.ts
  • testing/e2e/tests/test-matrix.ts
  • testing/e2e/vite.config.ts

Comment thread packages/typescript/ai-elevenlabs/src/adapters/audio.ts
Comment on lines +205 to +222
function inferOutputFormatFromResponseFormat(
format: TTSOptions['format'] | undefined,
): ElevenLabsOutputFormat | undefined {
switch (format) {
case 'mp3':
return 'mp3_44100_128'
case 'pcm':
return 'pcm_44100'
case 'opus':
return 'opus_48000_128'
case undefined:
return undefined
default:
// `aac` / `flac` / `wav` are not native ElevenLabs formats —
// fall back to mp3 rather than blowing up mid-request.
return 'mp3_44100_128'
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent mp3 fallback for wav/aac/flac may surprise callers.

When the caller explicitly requests format: 'wav' (or aac/flac), the adapter silently returns MP3 audio (the returned format/contentType reflect MP3, which is correct, but the mismatch between the request and the actual returned format is not surfaced). Consider at least logging a warning through options.logger so the divergence is observable, or narrowing TTSOptions['format'] at the type level per provider.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-elevenlabs/src/adapters/speech.ts` around lines 205 -
222, The function inferOutputFormatFromResponseFormat currently silently maps
unsupported requested formats ('wav'/'aac'/'flac') to the MP3 fallback
('mp3_44100_128'); update it to accept an optional logging interface (e.g., add
a parameter options or logger) and emit a warning when falling back so callers
see the divergence (reference inferOutputFormatFromResponseFormat and the
default branch); ensure the warning includes the originally requested format and
the actual returned ElevenLabs format, and keep the existing fallback return
value ('mp3_44100_128') for compatibility.

Comment on lines +236 to +249
function normalizeAudioInput(
audio: TranscriptionOptions['audio'],
): NormalizedAudio {
if (audio instanceof ArrayBuffer) {
return { kind: 'file', value: new Blob([audio]) }
}
if (typeof audio === 'string') {
const blob = dataUrlToBlob(audio)
if (blob) return { kind: 'file', value: blob }
return { kind: 'url', value: audio }
}
// Blob or File both fit the SDK's `Uploadable` contract.
return { kind: 'file', value: audio }
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unvalidated string fallback lands in cloudStorageUrl.

Any string that is not a data URL is forwarded to the SDK as cloudStorageUrl, including malformed URLs, local file paths, or raw base64 without the data: prefix. The failure then surfaces as a remote SDK/API error rather than a clear local one. Consider constraining the fallback to http(s):// prefixes (or known cloud schemes) and throwing a descriptive error otherwise.

♻️ Suggested tightening
 function normalizeAudioInput(
   audio: TranscriptionOptions['audio'],
 ): NormalizedAudio {
   if (audio instanceof ArrayBuffer) {
     return { kind: 'file', value: new Blob([audio]) }
   }
   if (typeof audio === 'string') {
     const blob = dataUrlToBlob(audio)
     if (blob) return { kind: 'file', value: blob }
-    return { kind: 'url', value: audio }
+    if (/^https?:\/\//i.test(audio)) return { kind: 'url', value: audio }
+    throw new Error(
+      'ElevenLabs transcription: string audio must be a data: URL or http(s):// URL.',
+    )
   }
   // Blob or File both fit the SDK's `Uploadable` contract.
   return { kind: 'file', value: audio }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-elevenlabs/src/adapters/transcription.ts` around lines
236 - 249, In normalizeAudioInput, the string branch currently treats any
non-data URL as a cloudStorageUrl; change the typeof audio === 'string' handling
to validate the string schema (e.g., accept /^https?:\/\// and any allowed cloud
schemes like s3://, gs://, az://) before returning { kind: 'url', value: audio
}, and otherwise throw a clear, descriptive error (include the incoming value)
so malformed URLs/local paths/raw base64 are rejected locally; keep using
dataUrlToBlob for data: URLs and leave the ArrayBuffer/Blob/File handling
unchanged.

Comment on lines +53 to +59
export function isElevenLabsMusicModel(model: string): boolean {
return model === 'music_v1'
}

export function isElevenLabsSoundEffectsModel(model: string): boolean {
return model.startsWith('eleven_text_to_sound_')
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Predicate asymmetry limits forward‑compatibility for music models.

isElevenLabsMusicModel uses exact equality while isElevenLabsSoundEffectsModel uses a prefix match. Combined with ElevenLabsMusicModel being widened to 'music_v1' | (string & {}) and the file's own comment that "ElevenLabs ships new model IDs more often than we cut a release", any future music model (e.g. music_v2) will fall through to the "Unsupported ElevenLabs audio model" error in adapters/audio.ts even though the type accepts it. Aligning with the SFX predicate style keeps the contract consistent and avoids a code change every time a new music model ships.

♻️ Suggested tweak
-export function isElevenLabsMusicModel(model: string): boolean {
-  return model === 'music_v1'
-}
+export function isElevenLabsMusicModel(model: string): boolean {
+  return model === 'music_v1' || model.startsWith('music_v')
+}

Note: this also requires dropping the hardcoded modelId: 'music_v1' in runMusic (see comment on audio.ts).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-elevenlabs/src/model-meta.ts` around lines 53 - 59,
The predicate for music models is too strict: update isElevenLabsMusicModel to
mirror isElevenLabsSoundEffectsModel by using a prefix or pattern match (e.g.,
startsWith('music_')) so future music model IDs like 'music_v2' are recognized;
adjust the related usage in adapters/audio.ts where runMusic currently hardcodes
modelId: 'music_v1' to pass through the actual model string instead, ensuring
consistent handling between isElevenLabsMusicModel,
isElevenLabsSoundEffectsModel, and runMusic.

tombeckenham and others added 2 commits April 24, 2026 16:38
…to SDK

Drop `(string & {})` widening from the ElevenLabs model id types so callers
are blocked from passing unknown models — the pinned lists are now the
source of truth, kept in sync via the automated SDK update pipeline.

Alias `ElevenLabsOutputFormat` to the SDK's `AllowedOutputFormats` so that
a plain `@elevenlabs/elevenlabs-js` version bump carries the format list
through with no manual regeneration. Removes drift (`mp3_24000_48`,
`pcm_32000` were already missing) and lets us drop the `as never` casts
at the SDK boundary.

Also promote `isElevenLabsMusicModel` / `isElevenLabsSoundEffectsModel`
to type predicates so the dispatch in `runMusic` / `runSoundEffects` is
visibly narrowed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
packages/typescript/ai-elevenlabs/src/adapters/speech.ts (1)

171-173: Unused generateId override.

generateSpeech calls the imported generateId(this.name) utility directly (line 156), so this protected override is never invoked. BaseTTSAdapter already ships a generateId() implementation; this override can be dropped to reduce surface area.

♻️ Suggested cleanup
-  protected override generateId(): string {
-    return generateId(this.name)
-  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-elevenlabs/src/adapters/speech.ts` around lines 171 -
173, The protected override generateId(): string in the Speech adapter is unused
because generateSpeech() calls the imported utility generateId(this.name)
directly, so remove the redundant override to rely on BaseTTSAdapter's
generateId() implementation; delete the protected override generateId() method
from the class and ensure there are no other references to that override so
generateSpeech and other methods use the base-class behavior.
packages/typescript/ai-elevenlabs/src/adapters/audio.ts (2)

219-221: generateId override is effectively dead code within this class.

The override replaces the base-class helper, but nothing inside ElevenLabsAudioAdapter calls this.generateId()finalize (line 209) calls the imported utility generateId(this.name) directly, bypassing the method. Either drop the override or route finalize through this.generateId() so the override actually takes effect (and any future subclass can customize it). Not a bug, just cleanup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts` around lines 219 -
221, The class has an unused override of generateId() in ElevenLabsAudioAdapter
because finalize currently calls the imported utility generateId(this.name)
directly; update finalize to call this.generateId() so the override is honored
(or alternatively remove the override if you prefer not to support
customization). Locate the finalize method and replace the direct call to the
imported generateId(...) with a call to this.generateId(), ensuring the
class-level override will be invoked for subclasses.

85-88: Redundant intersection branch in the provider-options union.

(ElevenLabsMusicProviderOptions & ElevenLabsSoundEffectsProviderOptions) is already a subtype of both ElevenLabsMusicProviderOptions and ElevenLabsSoundEffectsProviderOptions, so the first member of the union adds nothing — the whole expression is structurally equivalent to ElevenLabsMusicProviderOptions | ElevenLabsSoundEffectsProviderOptions. The extra branch also subtly encourages callers to pass music+SFX fields together, which the adapter doesn't actually honor (each code path only reads its own subset).

♻️ Proposed simplification
-export type ElevenLabsAudioProviderOptions =
-  | (ElevenLabsMusicProviderOptions & ElevenLabsSoundEffectsProviderOptions)
-  | ElevenLabsMusicProviderOptions
-  | ElevenLabsSoundEffectsProviderOptions
+export type ElevenLabsAudioProviderOptions =
+  | ElevenLabsMusicProviderOptions
+  | ElevenLabsSoundEffectsProviderOptions
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts` around lines 85 -
88, The union type ElevenLabsAudioProviderOptions includes a redundant
intersection branch; replace the current declaration that includes
(ElevenLabsMusicProviderOptions & ElevenLabsSoundEffectsProviderOptions) with
the simpler union ElevenLabsMusicProviderOptions |
ElevenLabsSoundEffectsProviderOptions so the type is not misleading about
combined music+SFX fields (refer to ElevenLabsAudioProviderOptions,
ElevenLabsMusicProviderOptions, and ElevenLabsSoundEffectsProviderOptions).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts`:
- Around line 151-173: In runMusic, when music.compositionPlan is present the
options.duration is ignored by client.music.compose but still passed into
finalize causing result.audio.duration to be incorrect; update runMusic to
detect music.compositionPlan and either (a) compute the true duration by summing
each section.durationMs from music.compositionPlan (convert ms to seconds) and
pass that computed value to finalize, or (b) omit passing duration to finalize
when a compositionPlan exists so result.audio.duration is not set; modify the
logic around the call sites of client.music.compose and finalize (referencing
runMusic, client.music.compose, finalize, options.duration, and
music.compositionPlan) to implement one of these two behaviors so
result.audio.duration reflects the actual composition length or is left out.

---

Nitpick comments:
In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts`:
- Around line 219-221: The class has an unused override of generateId() in
ElevenLabsAudioAdapter because finalize currently calls the imported utility
generateId(this.name) directly; update finalize to call this.generateId() so the
override is honored (or alternatively remove the override if you prefer not to
support customization). Locate the finalize method and replace the direct call
to the imported generateId(...) with a call to this.generateId(), ensuring the
class-level override will be invoked for subclasses.
- Around line 85-88: The union type ElevenLabsAudioProviderOptions includes a
redundant intersection branch; replace the current declaration that includes
(ElevenLabsMusicProviderOptions & ElevenLabsSoundEffectsProviderOptions) with
the simpler union ElevenLabsMusicProviderOptions |
ElevenLabsSoundEffectsProviderOptions so the type is not misleading about
combined music+SFX fields (refer to ElevenLabsAudioProviderOptions,
ElevenLabsMusicProviderOptions, and ElevenLabsSoundEffectsProviderOptions).

In `@packages/typescript/ai-elevenlabs/src/adapters/speech.ts`:
- Around line 171-173: The protected override generateId(): string in the Speech
adapter is unused because generateSpeech() calls the imported utility
generateId(this.name) directly, so remove the redundant override to rely on
BaseTTSAdapter's generateId() implementation; delete the protected override
generateId() method from the class and ensure there are no other references to
that override so generateSpeech and other methods use the base-class behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d1952ccc-f9aa-4d5b-b24e-73b514b5a8e6

📥 Commits

Reviewing files that changed from the base of the PR and between c3b18f9 and 3b588ed.

📒 Files selected for processing (3)
  • packages/typescript/ai-elevenlabs/src/adapters/audio.ts
  • packages/typescript/ai-elevenlabs/src/adapters/speech.ts
  • packages/typescript/ai-elevenlabs/src/model-meta.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/typescript/ai-elevenlabs/src/model-meta.ts

Comment on lines +151 to +173
const stream = await this.client.music.compose({
modelId,
...(options.prompt && !music.compositionPlan
? { prompt: options.prompt }
: {}),
...(music.compositionPlan
? { compositionPlan: toMusicPrompt(music.compositionPlan) }
: {}),
...(options.duration != null && !music.compositionPlan
? { musicLengthMs: Math.round(options.duration * 1000) }
: {}),
...(outputFormat ? { outputFormat } : {}),
...(music.seed != null ? { seed: music.seed } : {}),
...(music.forceInstrumental != null
? { forceInstrumental: music.forceInstrumental }
: {}),
...(music.respectSectionsDurations != null
? { respectSectionsDurations: music.respectSectionsDurations }
: {}),
})

return this.finalize(stream, outputFormat, options.duration)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

audio.duration can misrepresent the generated track when compositionPlan is used.

In runMusic, options.duration is intentionally not forwarded to client.music.compose when a compositionPlan is supplied (lines 159-161) — the real length is derived from the sum of section.durationMs values. However, line 172 still unconditionally passes options.duration into finalize, which attaches it to result.audio.duration. If a caller supplies both a compositionPlan and (ignored) duration: 15, the response will claim a 15 s track while the actual audio may be much longer/shorter. Consider suppressing duration in the composition-plan path (or deriving it from the plan) so the field either reflects reality or is omitted.

🛠️ Suggested change
-    return this.finalize(stream, outputFormat, options.duration)
+    const resolvedDuration = music.compositionPlan
+      ? undefined
+      : options.duration
+    return this.finalize(stream, outputFormat, resolvedDuration)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const stream = await this.client.music.compose({
modelId,
...(options.prompt && !music.compositionPlan
? { prompt: options.prompt }
: {}),
...(music.compositionPlan
? { compositionPlan: toMusicPrompt(music.compositionPlan) }
: {}),
...(options.duration != null && !music.compositionPlan
? { musicLengthMs: Math.round(options.duration * 1000) }
: {}),
...(outputFormat ? { outputFormat } : {}),
...(music.seed != null ? { seed: music.seed } : {}),
...(music.forceInstrumental != null
? { forceInstrumental: music.forceInstrumental }
: {}),
...(music.respectSectionsDurations != null
? { respectSectionsDurations: music.respectSectionsDurations }
: {}),
})
return this.finalize(stream, outputFormat, options.duration)
}
const stream = await this.client.music.compose({
modelId,
...(options.prompt && !music.compositionPlan
? { prompt: options.prompt }
: {}),
...(music.compositionPlan
? { compositionPlan: toMusicPrompt(music.compositionPlan) }
: {}),
...(options.duration != null && !music.compositionPlan
? { musicLengthMs: Math.round(options.duration * 1000) }
: {}),
...(outputFormat ? { outputFormat } : {}),
...(music.seed != null ? { seed: music.seed } : {}),
...(music.forceInstrumental != null
? { forceInstrumental: music.forceInstrumental }
: {}),
...(music.respectSectionsDurations != null
? { respectSectionsDurations: music.respectSectionsDurations }
: {}),
})
const resolvedDuration = music.compositionPlan
? undefined
: options.duration
return this.finalize(stream, outputFormat, resolvedDuration)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-elevenlabs/src/adapters/audio.ts` around lines 151 -
173, In runMusic, when music.compositionPlan is present the options.duration is
ignored by client.music.compose but still passed into finalize causing
result.audio.duration to be incorrect; update runMusic to detect
music.compositionPlan and either (a) compute the true duration by summing each
section.durationMs from music.compositionPlan (convert ms to seconds) and pass
that computed value to finalize, or (b) omit passing duration to finalize when a
compositionPlan exists so result.audio.duration is not set; modify the logic
around the call sites of client.music.compose and finalize (referencing
runMusic, client.music.compose, finalize, options.duration, and
music.compositionPlan) to implement one of these two behaviors so
result.audio.duration reflects the actual composition length or is left out.

…realtime example

Mirror the `ELEVENLABS_API_KEY` pattern for agent ids: add
`getElevenLabsAgentIdFromEnv()` and make `agentId` optional on
`ElevenLabsRealtimeTokenOptions`. `elevenlabsRealtimeToken()` now
resolves `options.agentId ?? ELEVENLABS_AGENT_ID` at call time.

Simplify the ts-react-chat example: drop the manual `process.env`
dance and the Agent ID text input from the realtime page — the adapter
handles the env fallback now. Replace the input with a Language
selector that threads `overrides.language` through to the session, so
users can switch off the agent's dashboard default (common need when
the agent is configured for one language but a caller wants another).

Also broaden `.env.example` in ts-react-chat to cover every provider
the example actually reads (Anthropic, Gemini, xAI, Groq, OpenRouter,
fal) — previously only OpenAI and ElevenLabs were listed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(ai-elevenlabs): TTS / Music / SFX / Transcription adapters via official SDK

1 participant