Skip to content

feat: audio media support — fal audio/speech/STT adapters, Gemini Lyria + 3.1 Flash TTS, streaming generateAudio + hooks#463

Merged
AlemTuzlak merged 40 commits intoTanStack:mainfrom
tombeckenham:328-fal-audio-and-speech-support
Apr 23, 2026
Merged

feat: audio media support — fal audio/speech/STT adapters, Gemini Lyria + 3.1 Flash TTS, streaming generateAudio + hooks#463
AlemTuzlak merged 40 commits intoTanStack:mainfrom
tombeckenham:328-fal-audio-and-speech-support

Conversation

@tombeckenham
Copy link
Copy Markdown
Contributor

@tombeckenham tombeckenham commented Apr 17, 2026

Summary

Closes #328.

Adds audio generation / speech / transcription adapters for fal and Gemini, a new generateAudio activity with streaming support, framework hooks (useGenerateAudio / createGenerateAudio), and example pages in ts-react-chat.

ElevenLabs adapters (Speech / Music / SFX / Transcription) originally landed here but were split out into #485

What's in this PR

Core @tanstack/ai

  • New generateAudio activity (music, sound-effects, and the broader fal audio catalog — audio-to-audio, voice conversion / cloning, enhancement, separation, isolation, understanding, merge, etc.) with a matching AudioAdapter interface, BaseAudioAdapter base class, and audio:request:started / audio:request:completed / audio:usage devtools events.
  • generateAudio() now supports stream: true, returning an AsyncIterable<StreamChunk> that can be piped through toServerSentEventsResponse() — matching the other media activities.
  • generateSpeech / generateTranscription activity generics tightened to TTSProviderOptions<TAdapter> / TranscriptionProviderOptions<TAdapter> so typed provider options flow through.
  • GeneratedImage and GeneratedAudio now enforce exactly one of url or b64Json via a mutually-exclusive GeneratedMediaSource union (see e109e962). Existing img.url || ``data:...;base64,${img.b64Json}``` read patterns still work; @tanstack/ai-openrouterand@tanstack/ai-falimage adapters no longer synthesize adata:` URL when the provider returned base64.

Framework hooks

  • @tanstack/ai-client: new AudioGenerateInput type.
  • @tanstack/ai-react, @tanstack/ai-solid, @tanstack/ai-vue: new useGenerateAudio hook.
  • @tanstack/ai-svelte: new createGenerateAudio.
  • All follow the existing media-hook shape — { generate, result, isLoading, error, status, stop, reset } — and support both connection (SSE) and fetcher transports.

Provider adapters

  • @tanstack/ai-gemini:
    • New geminiAudio() adapter for Lyria 3 Pro / Lyria 3 Clip music generation (MP3/WAV 48 kHz stereo; clip variant returns 30-second MP3s).
    • Added gemini-3.1-flash-tts-preview to the TTS model list (70+ languages, 200+ audio tags).
    • Added multiSpeakerVoiceConfig to GeminiTTSProviderOptions for 2-speaker dialogue generation.
  • @tanstack/ai-fal: three new tree-shakeable adapters alongside the existing falImage() / falVideo():
    • falSpeech() — TTS via models like fal-ai/gemini-3.1-flash-tts, fal-ai/minimax/speech-2.6-hd, fal-ai/kokoro/*.
    • falTranscription() — STT via fal-ai/whisper, fal-ai/wizper, fal-ai/speech-to-text/turbo.
    • falAudio() — music and SFX via fal-ai/minimax-music/v2.6, fal-ai/diffrhythm, fal-ai/lyria2, fal-ai/stable-audio-25/text-to-audio, fal-ai/elevenlabs/sound-effects/v2, plus the wider audio catalog.
    • Per-model duration translation fix (8b4cb006) so each model's expected duration parameter is forwarded correctly.

Docs

  • New media/audio-generation.md guide — leads with Gemini (Lyria), then fal.
  • media/text-to-speech.md and media/transcription.md gain fal sections.
  • docs/adapters/fal.md expanded with TTS, transcription, and audio sections plus a full model table.
  • docs/adapters/gemini.md updated for Lyria and Gemini 3.1 Flash TTS.

Examples (examples/ts-react-chat)

  • TTS and transcription pages now have provider tabs:
    • TTS: OpenAI / Gemini / Fal (Kokoro)
    • Transcription: OpenAI Whisper / Fal Whisper
  • New /generations/audio page covering Gemini Lyria and fal audio generation with a model selector.
  • Home page gets an audio tile, sample prompts, and a server-side adapter factory for the new routes.
  • Warning UI when a chosen audio model override is rejected (cc273a59).

Tests

  • Unit tests for every new adapter (Gemini audio + Gemini TTS multi-speaker; fal speech / transcription / audio).
  • Unit tests for useGenerateAudio / createGenerateAudio across all framework packages.

History notes

  • ElevenLabs split out — the ElevenLabs Speech / Music / SFX / Transcription adapters originally lived in this branch but were moved to feat(ai-elevenlabs): TTS / Music / SFX / Transcription adapters via official SDK #485 (9998a093) and have since merged on main. This PR no longer touches @tanstack/ai-elevenlabs source.
  • generateAudio stayed a single activity — briefly split into generateMusic + generateSoundEffects (1010e9b7), then reverted in 247ef7ba once fal's full audio catalog made the music/SFX binary clearly too narrow (dozens of audio-to-audio, voice, enhancement, separation, isolation, understanding, merge models span both or neither category).
  • E2E audio-gen scaffolding dropped (b00b28ac) — aimock 1.14 has no generateContent AUDIO-modality mock for Lyria, so the harness pieces were removed rather than left dormant. They will land with a follow-up once upstream aimock support exists.

Test plan

  • pnpm test:lib (affected packages pass)
  • pnpm test:types (affected packages pass)
  • pnpm test:eslint (affected packages pass)
  • pnpm test:docs (no broken links)
  • E2E for generateAudio — deferred pending upstream aimock Lyria / AUDIO-modality support
  • Manual smoke: open each generations page in ts-react-chat dev server and verify tab switching + audio playback per provider (OpenAI / Gemini / Fal for TTS; OpenAI / Fal for transcription; Gemini Lyria / Fal Audio for /generations/audio)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Audio generation (music/sfx) with streaming support and cross-framework hooks (React/Solid/Vue/Svelte); provider selection in examples and demo UI; server SSE endpoint for streaming audio.
    • New Gemini Lyria and Fal audio offerings plus Fal text‑to‑speech and transcription support.
  • Bug Fixes

    • Media objects now enforce exactly one of URL or base64 payload; adapters returning base64 no longer synthesize data: URLs.
  • Documentation

    • New Audio Generation guide and updated Fal/Gemini speech & transcription docs.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new audio generation activity and types, adapter primitives and provider adapters (Fal, Gemini Lyria), streaming (SSE) support, framework hooks/creators (React/Solid/Vue/Svelte), example UI + server routes, docs/changesets, devtools events, tests, and tightened media types to a mutually-exclusive url/b64Json union.

Changes

Cohort / File(s) Summary
Changesets & Docs
*.changeset/*.md, docs/*, docs/config.json, packages/typescript/ai/skills/ai-core/media-generation/SKILL.md
New changesets and documentation pages for audio generation, navigation entry, examples, debug notes, and SKILL updates.
Core audio activity & types
packages/typescript/ai/src/activities/generateAudio/*, packages/typescript/ai/src/activities/index.ts, packages/typescript/ai/src/index.ts, packages/typescript/ai/src/types.ts, packages/typescript/ai-client/src/*
Introduces AudioAdapter interface, BaseAudioAdapter, generateAudio (sync + stream), createAudioOptions, AudioGeneration types, and AudioGenerateInput.
Devtools & events
packages/typescript/ai-event-client/src/index.ts
Adds audio devtools events (audio:request:started/completed/error, audio:usage) and speech/transcription error events; extends ToolCall with providerMetadata.
Fal provider (adapters, utils, tests)
packages/typescript/ai-fal/src/adapters/*, .../src/model-meta.ts, .../src/utils/*, .../src/index.ts, packages/typescript/ai-fal/tests/*, packages/typescript/ai-fal/package.json
Adds falAudio, falSpeech, falTranscription adapters, model option types, duration mapping, client utils (content-type, base64), tests, exports and package keywords.
Gemini provider (adapters, metadata, tests)
packages/typescript/ai-gemini/src/adapters/audio.ts, .../src/model-meta.ts, .../src/index.ts, packages/typescript/ai-gemini/tests/*
Adds Gemini Lyria audio adapter, audio model metadata, new TTS model entry, tree-shakeable exports and tests.
Adapter parsing & base constructors
packages/typescript/ai-fal/src/adapters/image.ts, packages/typescript/ai-openrouter/src/adapters/image.ts, packages/typescript/ai-openai/src/adapters/image.ts, packages/typescript/ai-grok/src/adapters/image.ts, packages/typescript/ai/src/activities/*/adapter.ts
Tightens media parsing to mutually-exclusive url/b64Json, stop synthesizing data-URIs, add stricter response validation, and reorder base adapter constructors to (model, config).
Framework hooks / creators & exports
packages/typescript/ai-react/src/use-generate-audio.ts, packages/typescript/ai-solid/src/use-generate-audio.ts, packages/typescript/ai-vue/src/use-generate-audio.ts, packages/typescript/ai-svelte/src/create-generate-audio.svelte.ts, corresponding index.ts files & tests
Adds useGenerateAudio / createGenerateAudio across React, Solid, Vue, Svelte with fetcher and SSE transports and tests.
Examples & app wiring
examples/ts-react-chat/src/... (routes, server-fns, server-audio-adapters.ts, audio-providers.ts, Header.tsx, routeTree.gen.ts), examples/ts-react-chat/package.json
Adds /generations/audio UI page, /api/generate/audio SSE route, provider catalog, server adapter builders, route registration, UI wiring, and example dependency on @tanstack/ai-fal.
Server routes & validation
examples/ts-react-chat/src/routes/api.generate.audio.ts, api.generate.speech.ts, api.transcribe.ts, .../src/lib/server-audio-adapters.ts, .../src/lib/server-fns.ts
Adds robust JSON parsing, Zod validation, provider selection/builders, SSE streaming plumbing, and centralized adapter factory wiring for audio/speech/transcription.
Tests & test updates
packages/typescript/ai-fal/tests/*, packages/typescript/ai-gemini/tests/*, packages/typescript/*/tests/use-generation.test.ts, various adapter test updates
Adds comprehensive Vitest suites for Fal/Gemini audio/speech/transcription, hook tests for frameworks, and updated image-adapter expectations.
Utilities & misc
packages/typescript/ai-fal/src/utils/*, small provider-option & image/gemini util updates, terminalOutput removal
New helpers for URL extension extraction, audio content-type derivation, ArrayBuffer→base64, Fal client config behavior; adjustments to model-option helpers and removal of a build log artifact.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Browser UI
    participant Hook as useGenerateAudio
    participant Server as /api/generate/audio (SSE)
    participant Adapter as Audio Adapter (Fal/Gemini)
    participant Provider as External Provider API

    Client->>Hook: generate({ prompt, duration, provider })
    Hook->>Server: POST /api/generate/audio (body)
    Server->>Adapter: buildAudioAdapter(provider)\nadapter.generateAudio({ prompt, duration, stream: true })
    Adapter->>Provider: request (model, prompt, duration, modelOptions)
    Provider-->>Adapter: stream / response (b64Json or url, contentType)
    Adapter-->>Server: AsyncIterable<StreamChunk>
    Server-->>Hook: SSE chunks
    Hook->>Client: update state (result, isLoading, status)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Poem

🐰 I tunneled through code to plant a new beat,

Prompts turned to music—oh, how sweet!
Fal hums and Lyria sings by moonlight,
Hooks and SSE stitch day into night,
I hop, I clap—let the audio take flight! 🎶

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

tombeckenham and others added 4 commits April 17, 2026 19:06
Adds falSpeech, falTranscription, and falAudio adapters to @tanstack/ai-fal,
completing fal's media coverage alongside image and video. Introduces a new
generateAudio activity in @tanstack/ai for music and sound-effect generation,
with matching devtools events and types.

Closes TanStack#328

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Lyria + 3.1 Flash TTS

Extends @tanstack/ai-elevenlabs (which already covers realtime voice) with
Speech, Music, Sound Effects, and Transcription adapters, each tree-shakeable
under its own import.

Adds Gemini Lyria 3 Pro / Clip music generation via a new generateAudio
adapter, plus the new Gemini 3.1 Flash TTS Preview model with multi-speaker
dialogue support.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a new Audio Generation page, expands the fal adapter reference with
sections for text-to-speech, transcription, and audio/music, and adds fal
sections to the Text-to-Speech and Transcription guides.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Expand the ts-react-chat example with provider tabs for OpenAI,
ElevenLabs, Gemini, and Fal on the TTS and transcription pages, plus a
new /generations/audio page covering ElevenLabs Music, ElevenLabs SFX,
Gemini Lyria, and Fal audio generation.

Add a Gemini TTS unit test and wire an audio-gen feature into the E2E
harness (adapter factory, API route, UI, fixture, and Playwright spec).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tombeckenham tombeckenham force-pushed the 328-fal-audio-and-speech-support branch from 94ada28 to 068ca0d Compare April 17, 2026 09:22
@tombeckenham tombeckenham changed the title feat: audio/tts example pages and tests across providers feat: audio media support — fal adapters, ElevenLabs TTS/music/SFX/STT, Gemini Lyria + 3.1 Flash TTS Apr 17, 2026
@nx-cloud
Copy link
Copy Markdown

nx-cloud Bot commented Apr 17, 2026

View your CI Pipeline Execution ↗ for commit 662e447

Command Status Duration Result
nx run-many --targets=build --exclude=examples/** ✅ Succeeded 5s View ↗

☁️ Nx Cloud last updated this comment at 2026-04-23 12:24:09 UTC

Reorder the Audio Generation page so the direct Gemini (Lyria) and
ElevenLabs (music/sfx) adapters appear before fal.ai, and update the
environment variables + result-shape notes to cover all three providers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 17, 2026

Open in StackBlitz

@tanstack/ai

npm i https://pkg.pr.new/@tanstack/ai@463

@tanstack/ai-anthropic

npm i https://pkg.pr.new/@tanstack/ai-anthropic@463

@tanstack/ai-client

npm i https://pkg.pr.new/@tanstack/ai-client@463

@tanstack/ai-code-mode

npm i https://pkg.pr.new/@tanstack/ai-code-mode@463

@tanstack/ai-code-mode-skills

npm i https://pkg.pr.new/@tanstack/ai-code-mode-skills@463

@tanstack/ai-devtools-core

npm i https://pkg.pr.new/@tanstack/ai-devtools-core@463

@tanstack/ai-elevenlabs

npm i https://pkg.pr.new/@tanstack/ai-elevenlabs@463

@tanstack/ai-event-client

npm i https://pkg.pr.new/@tanstack/ai-event-client@463

@tanstack/ai-fal

npm i https://pkg.pr.new/@tanstack/ai-fal@463

@tanstack/ai-gemini

npm i https://pkg.pr.new/@tanstack/ai-gemini@463

@tanstack/ai-grok

npm i https://pkg.pr.new/@tanstack/ai-grok@463

@tanstack/ai-groq

npm i https://pkg.pr.new/@tanstack/ai-groq@463

@tanstack/ai-isolate-cloudflare

npm i https://pkg.pr.new/@tanstack/ai-isolate-cloudflare@463

@tanstack/ai-isolate-node

npm i https://pkg.pr.new/@tanstack/ai-isolate-node@463

@tanstack/ai-isolate-quickjs

npm i https://pkg.pr.new/@tanstack/ai-isolate-quickjs@463

@tanstack/ai-ollama

npm i https://pkg.pr.new/@tanstack/ai-ollama@463

@tanstack/ai-openai

npm i https://pkg.pr.new/@tanstack/ai-openai@463

@tanstack/ai-openrouter

npm i https://pkg.pr.new/@tanstack/ai-openrouter@463

@tanstack/ai-preact

npm i https://pkg.pr.new/@tanstack/ai-preact@463

@tanstack/ai-react

npm i https://pkg.pr.new/@tanstack/ai-react@463

@tanstack/ai-react-ui

npm i https://pkg.pr.new/@tanstack/ai-react-ui@463

@tanstack/ai-solid

npm i https://pkg.pr.new/@tanstack/ai-solid@463

@tanstack/ai-solid-ui

npm i https://pkg.pr.new/@tanstack/ai-solid-ui@463

@tanstack/ai-svelte

npm i https://pkg.pr.new/@tanstack/ai-svelte@463

@tanstack/ai-vue

npm i https://pkg.pr.new/@tanstack/ai-vue@463

@tanstack/ai-vue-ui

npm i https://pkg.pr.new/@tanstack/ai-vue-ui@463

@tanstack/preact-ai-devtools

npm i https://pkg.pr.new/@tanstack/preact-ai-devtools@463

@tanstack/react-ai-devtools

npm i https://pkg.pr.new/@tanstack/react-ai-devtools@463

@tanstack/solid-ai-devtools

npm i https://pkg.pr.new/@tanstack/solid-ai-devtools@463

commit: 662e447

tombeckenham and others added 4 commits April 17, 2026 19:28
…el selector

Expose an Audio tile on the welcome grid, offer one-click sample prompts
for every audio provider, and let the Fal provider pick between current
text-to-music models (default MiniMax v2.6). Threads a model override
through the audio API and server fn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tombeckenham added a commit to tombeckenham/ai-tom that referenced this pull request Apr 20, 2026
Snapshots uncommitted fixes to the speech/music/transcription adapters
and their tests on the split-out branch (see TanStack#485). Kept separate from
TanStack#463 so the fal + Gemini audio work can ship first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tombeckenham and others added 9 commits April 21, 2026 07:30
Moves the new ElevenLabs TTS / Music / SFX / Transcription REST adapters
out of this PR into their own issue (TanStack#485) and branch
(`elevenlabs-audio-adapters`) so the fal + Gemini audio work can ship
independently. The follow-up PR will rebuild these adapters on top of
the official `@elevenlabs/elevenlabs-js` SDK rather than hand-rolled
fetch calls.

Removed from this branch:
- `packages/typescript/ai-elevenlabs/src/{adapters,utils,model-meta.ts}`
  and their tests (realtime voice code untouched)
- ElevenLabs sections in `docs/media/audio-generation.md`
- ElevenLabs entries in `examples/ts-react-chat` audio-providers catalog,
  server adapter factories, zod schemas, and default provider wiring
- `@tanstack/ai-elevenlabs` bump from the audio changeset

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- ai-fal: replace `btoa(String.fromCharCode(...bytes))` with a chunked
  helper; the spread form throws RangeError on any realistic TTS clip
  (V8 arg limit ~65k).
- ai-gemini: honor `TTSOptions.voice` as a fallback for the prebuilt
  voice name, move `systemInstruction` inside `config` per the
  @google/genai contract, and wrap raw `audio/L16;codec=pcm` output in
  a RIFF/WAV container so the result is actually playable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Log a warning instead of silently swapping to the default when a client
sends a model id outside the provider's allowlist, so stale clients or
typo'd config ids are debuggable. Also correct the AudioProviderConfig
JSDoc to describe the models[] ordering as a non-binding UI convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the unreleased generateAudio activity with two distinct activities so
music and sound-effects each have their own types, adapter kinds, provider
factories, and devtools events. This lets providers advertise only the
capabilities they support (Gemini Lyria is music-only; fal has distinct music
and SFX catalogs) and leaves room for kind-specific options without a breaking
change.

- Core: generateMusic/generateSoundEffects activities and MusicAdapter/
  SoundEffectsAdapter interfaces + bases; GeneratedAudio shared between
  MusicGenerationResult and SoundEffectsGenerationResult
- Events: music:request:* and soundEffects:request:* replace audio:*
- fal: falMusic + falSoundEffects factories sharing internal request/response
  helpers; FalMusic/FalSoundEffectsProviderOptions in model-meta
- Gemini: geminiMusic/createGeminiMusic/GeminiMusicAdapter (Lyria is music-only
  so no SFX counterpart)
- ts-react-chat: /generations/music and /generations/sound-effects routes
  backed by a shared AudioGenerationForm; split server fns and API routes
- E2E: music-gen + sound-effects-gen features, parameterized MediaAudioGenUI,
  split fixtures and specs (both feature support sets are empty since
  aimock 1.14 cannot mock Gemini's Lyria AUDIO modality)
- Docs: music-generation.md + sound-effects-generation.md; fal adapter docs
  split; changesets rewritten in place

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Supersedes 1010e9b. The split into generateMusic + generateSoundEffects
doesn't hold up against fal's audio catalog: dozens of models span
audio-to-audio, voice-change/clone, enhancement, separation, isolation,
merge, and understanding, and individual models (e.g. stable-audio-25)
generate music AND sound effects. A single broader generateAudio activity
fits that reality.

Keeps the aimock Gemini-Lyria gap: audio-gen feature-support stays empty
because aimock 1.14 has no AUDIO-modality mock for generateContent — the
E2E is green by skipping rather than by hitting a mock that doesn't exist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… GeneratedAudio

Model GeneratedImage and GeneratedAudio on a shared mutually-exclusive GeneratedMediaSource union so the type rejects empty objects and objects that set both fields. Update the openai, gemini, grok, openrouter, and fal image adapters to construct results by branching on which field is present; openrouter and fal no longer synthesize a data URI on url when returning base64.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
autofix-ci Bot and others added 4 commits April 22, 2026 07:05
The audio-gen feature set was empty because aimock cannot currently mock audio generation, so the Playwright spec ran against zero providers. Remove the dead scaffolding; the wiring can return once aimock audio support lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the parity gap with the other media activities — audio generation
now has the same client-hook UX (connection + fetcher transports) as
image, speech, video, transcription, and summarize. Adds streaming to
generateAudio so it can ride the SSE transport, a matching
AudioGenerateInput type in ai-client, framework hooks in ai-react /
ai-solid / ai-vue / ai-svelte, unit tests, an updated ts-react-chat
example, and docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fal audio models use different input field names for length: ElevenLabs
Music takes `music_length_ms` in milliseconds, Stable Audio 2.5 takes
`seconds_total`, and most others accept `duration`. The adapter was
passing a generic `duration` unconditionally, so the slider in the
example was silently ignored for ElevenLabs and Stable Audio.

Also: align the Gemini Lyria adapter with the API's MP3 default (only
send responseMimeType when the caller asks for WAV), expand the example
to include Lyria 3 Pro and a dedicated Fal SFX provider, and rename the
example's "Direct" mode to "Hooks" to better reflect what it demos.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tombeckenham tombeckenham changed the title feat: audio media support — fal adapters, ElevenLabs TTS/music/SFX/STT, Gemini Lyria + 3.1 Flash TTS feat: audio media support — fal audio/speech/STT adapters, Gemini Lyria + 3.1 Flash TTS, streaming generateAudio + hooks Apr 22, 2026
Align the audio model constant and its re-export with the `generateAudio`
activity naming used across providers, and drop the unused duplicate
`GeminiLyriaModel` type — `GeminiAudioModel` is the single canonical type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tombeckenham tombeckenham marked this pull request as ready for review April 22, 2026 08:33
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
packages/typescript/ai-fal/package.json (1)

15-20: 🛠️ Refactor suggestion | 🟠 Major

Expose the new Fal adapters through /adapters subpath exports.

This package now includes audio/speech/transcription adapters, but the exports map still only exposes the root entrypoint. Add adapter subpath exports so consumers can import tree-shakeable implementations directly.

♻️ Proposed exports update
   "exports": {
     ".": {
       "types": "./dist/esm/index.d.ts",
       "import": "./dist/esm/index.js"
+    },
+    "./adapters/*": {
+      "types": "./dist/esm/adapters/*.d.ts",
+      "import": "./dist/esm/adapters/*.js"
     }
   },

As per coding guidelines, packages/typescript/ai-*/package.json: “Export tree-shakeable adapters from /adapters subpath exports in provider adapter packages”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-fal/package.json` around lines 15 - 20, Update the
package "exports" map to expose the adapters subpath so consumers can import
specific Fal adapter modules directly; add entries under "exports" such as
"./adapters": { "types": "./dist/esm/adapters/index.d.ts", "import":
"./dist/esm/adapters/index.js" } and a pattern export "./adapters/*": { "types":
"./dist/esm/adapters/*.d.ts", "import": "./dist/esm/adapters/*.js" } (adjust
filenames to match the built outputs) so tree-shakeable implementations in the
adapters folder are exported alongside the root "." entry.
docs/media/generation-hooks.md (1)

20-34: ⚠️ Potential issue | 🟡 Minor

Finish wiring useGenerateAudio into the rest of the guide.

Line 29 adds the hook to the overview, but the server-function list, framework variants table, and Next Steps still omit audio generation.

📝 Proposed docs update
-The same pattern applies to all generation types -- swap `generateImage` for `generateSpeech`, `generateTranscription`, `summarize`, or `generateVideo`. See the individual media guides for server-side details.
+The same pattern applies to all generation types -- swap `generateImage` for `generateAudio`, `generateSpeech`, `generateTranscription`, `summarize`, or `generateVideo`. See the individual media guides for server-side details.
 | Generation Type | React (`@tanstack/ai-react`) | Vue (`@tanstack/ai-vue`) | Svelte (`@tanstack/ai-svelte`) |
 |----------------|------------------------------|--------------------------|-------------------------------|
 | Image | `useGenerateImage` | `useGenerateImage` | `createGenerateImage` |
+| Audio | `useGenerateAudio` | `useGenerateAudio` | `createGenerateAudio` |
 | Speech | `useGenerateSpeech` | `useGenerateSpeech` | `createGenerateSpeech` |
 - [Image Generation](./image-generation) -- Provider-specific options, sizes, and model availability
+- [Audio Generation](./audio-generation) -- Music, sound effects, model options, and generated audio playback
 - [Text-to-Speech](./text-to-speech) -- Voice options, audio formats, and streaming audio

Also applies to: 62-62, 425-450

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/media/generation-hooks.md` around lines 20 - 34, The docs mention
useGenerateAudio in the overview but omitted audio elsewhere—update the
server-function list, the framework variants table, and the "Next Steps" section
to include audio generation; specifically add entries referencing
useGenerateAudio, AudioGenerateInput, and AudioGenerationResult (or TTSResult if
applicable) where other hooks (useGenerateImage, useTranscription, etc.) are
listed so audio appears in the server function examples, each framework variant
table row, and the actionable next-step links or examples; ensure naming matches
the overview table (useGenerateAudio, AudioGenerateInput, AudioGenerationResult)
and mirror the formatting used for other media hooks.
docs/media/transcription.md (1)

79-108: ⚠️ Potential issue | 🟡 Minor

Update the shared transcription tables for fal.ai.

Line 81 documents URL/File/Blob/ArrayBuffer support, but Line 108 still lists only File | string, and the environment section still only mentions OpenAI credentials.

📝 Proposed docs update
-| `audio` | `File \| string` | Audio data (File object or base64 string) - required |
+| `audio` | `string \| File \| Blob \| ArrayBuffer` | Audio data — URL, data URL/base64 string, File, Blob, or ArrayBuffer, depending on provider |
-The transcription adapter uses:
+Transcription adapters use:
 
 - `OPENAI_API_KEY`: Your OpenAI API key
+- `FAL_KEY`: Your fal.ai API key, when using `falTranscription()`

Also applies to: 515-520

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/media/transcription.md` around lines 79 - 108, Update the fal.ai
transcription docs to match the examples: in the "Common Options" table change
the `audio` Option type for generateTranscription / falTranscription to include
URL, File, Blob, and ArrayBuffer (e.g., `File | string | Blob | ArrayBuffer |
URL`) and ensure any mention of "base64 string" reflects accepted types; also
update the environment/credentials section to add fal.ai-specific credential
instructions (add the fal.ai API key variable and usage note) and apply the same
table/credentials fixes to the duplicate block around lines 515-520.
examples/ts-react-chat/src/routes/api.transcribe.ts (1)

10-24: ⚠️ Potential issue | 🟠 Major

Validate the request body before building the adapter.

Line 11 only casts untrusted JSON; malformed requests or unsupported provider values can throw and surface as 500s. Return a 400 for invalid payloads and adapter selection failures.

🛡️ Proposed guard
       POST: async ({ request }) => {
-        const body = await request.json()
-        const { audio, language, provider } = body.data as {
-          audio: string
-          language?: string
-          provider?: TranscriptionProviderId
-        }
-
-        const adapter = buildTranscriptionAdapter(provider ?? 'openai')
+        const body = await request.json().catch(() => null)
+        const data = body?.data
+
+        if (!data || typeof data.audio !== 'string' || data.audio.length === 0) {
+          return new Response('Missing required audio payload', { status: 400 })
+        }
+
+        const {
+          audio,
+          language,
+          provider,
+        } = data as {
+          audio: string
+          language?: string
+          provider?: TranscriptionProviderId
+        }
+
+        let adapter
+        try {
+          adapter = buildTranscriptionAdapter(provider ?? 'openai')
+        } catch {
+          return new Response('Unsupported transcription provider', {
+            status: 400,
+          })
+        }
 
         const stream = generateTranscription({
           adapter,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/api.transcribe.ts` around lines 10 - 24,
Validate the incoming JSON and its data before calling
buildTranscriptionAdapter: check that request.json() yields an object with a
data property where data.audio is a non-empty string and optional data.language
is a string; validate data.provider against the allowed TranscriptionProviderId
values (or use a whitelist) and return a 400 response for missing/invalid
payloads. Only after validation, call buildTranscriptionAdapter(provider) inside
a try/catch and return a 400 if adapter selection fails (unknown provider)
instead of letting it throw a 500, then proceed to call generateTranscription
with the validated audio/language and adapter.
docs/media/text-to-speech.md (1)

32-63: ⚠️ Potential issue | 🟡 Minor

Fix TTS adapter factory names to match exports.

Line 34 imports openaiTTS which is not exported from @tanstack/ai-openai; the correct export is openaiSpeech. Line 37 then uses openaiSpeech() without importing it. Additionally, line 62 uses geminiTTS() which does not exist; the exported function is geminiSpeech. Update the imports and function calls to match the actual exported names.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/media/text-to-speech.md` around lines 32 - 63, Update the TTS adapter
imports and calls to use the actual exported factory names: replace the
incorrect openaiTTS import/use with openaiSpeech (ensure import uses
openaiSpeech and the adapter creation calls openaiSpeech()), and replace the
geminiTTS call with geminiSpeech (ensure import uses geminiSpeech and adapter
creation calls geminiSpeech()). Verify references to generateSpeech remain
unchanged and that adapter variables use the corrected factory names
openaiSpeech and geminiSpeech.
🟡 Minor comments (22)
.changeset/generated-media-union.md-20-20 (1)

20-20: ⚠️ Potential issue | 🟡 Minor

Fix spacing around inline code in the release note.

Several inline code spans are glued to neighboring words, which will render/read poorly.

📝 Proposed wording cleanup
-Existing read patterns like `img.url || \`data:image/png;base64,${img.b64Json}\``continue to work unchanged. The only runtime-visible change is that the`@tanstack/ai-openrouter`and`@tanstack/ai-fal`image adapters no longer populate`url`with a synthesized`data:image/png;base64,...`URI when the provider returns base64 — they return`{ b64Json }`only. Consumers that want a data URI should build it from`b64Json` at render time.
+Existing read patterns like `img.url || \`data:image/png;base64,${img.b64Json}\`` continue to work unchanged. The only runtime-visible change is that the `@tanstack/ai-openrouter` and `@tanstack/ai-fal` image adapters no longer populate `url` with a synthesized `data:image/png;base64,...` URI when the provider returns base64 — they return `{ b64Json }` only. Consumers that want a data URI should build it from `b64Json` at render time.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.changeset/generated-media-union.md at line 20, Fix the spacing around
inline code spans in the release note: ensure there is a space before and after
inline code tokens so they don't run into surrounding words — e.g., change
`img.url || \`data:image/png;base64,${img.b64Json}\``continue to work to
`img.url || \`data:image/png;base64,${img.b64Json}\` continue to work`, and add
spaces around `@tanstack/ai-openrouter`, `@tanstack/ai-fal`, `url`,
`data:image/png;base64,...`, and `{ b64Json }` wherever they are glued to
adjacent words so all inline code spans are separated by whitespace for correct
rendering.
.changeset/generated-media-union.md-3-4 (1)

3-4: ⚠️ Potential issue | 🟡 Minor

Bump adapter packages with runtime-visible media shape changes as minor.

The note says @tanstack/ai-openrouter and @tanstack/ai-fal no longer populate synthesized url values for base64 image results. That is a shape/behavior change for those packages, so these should be minor rather than patch.

📝 Proposed changeset bump
-'@tanstack/ai-openrouter': patch
-'@tanstack/ai-fal': patch
+'@tanstack/ai-openrouter': minor
+'@tanstack/ai-fal': minor

Based on learnings, “In the TanStack/ai repository (pre-1.0), follow the repo’s versioning convention: breaking changes and breaking/shape changes documented in Changesets must use a minor version bump (not major).”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.changeset/generated-media-union.md around lines 3 - 4, The changeset
incorrectly marks '@tanstack/ai-openrouter' and '@tanstack/ai-fal' as patch
bumps even though they introduce a runtime-visible media shape/behavior change
(synthesized url removal); update the changeset entry in
.changeset/generated-media-union.md to change both package bumps from 'patch' to
'minor' so the release reflects the shape change for these adapters.
docs/media/text-to-speech.md-69-108 (1)

69-108: ⚠️ Potential issue | 🟡 Minor

Add fal.ai to the environment variables and explicit API keys sections.

The fal.ai code examples (lines 69-108) lack setup instructions. The "Environment Variables" section (lines 491-494) only mentions OpenAI and Gemini, and the "Explicit API Keys" section (lines 501-509) has no example for falSpeech. Users need to know they can either set the FAL_KEY environment variable or pass an explicit API key to the adapter.

Add FAL_KEY to the environment variables list, and include a falSpeech example in the explicit API keys section:

Environment variables section update
 The TTS adapters use the same environment variables as other adapters:
 
 - **OpenAI**: `OPENAI_API_KEY`
 - **Gemini**: `GOOGLE_API_KEY` or `GEMINI_API_KEY`
+- **fal.ai**: `FAL_KEY`

Also add a falSpeech example to the explicit API keys section alongside the OpenAI and Gemini examples.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/media/text-to-speech.md` around lines 69 - 108, Update docs to document
fal.ai API key usage: add FAL_KEY to the "Environment Variables" list and show
how to supply an explicit key to the falSpeech adapter. Specifically, in the env
vars section add a bullet for FAL_KEY (used by falSpeech), and in the "Explicit
API Keys" section add an example that demonstrates passing { apiKey:
process.env.FAL_KEY } (or a literal) into falSpeech(...) when calling
generateSpeech({ adapter: falSpeech(...), ... }). Reference symbols: FAL_KEY,
falSpeech, generateSpeech, adapter, modelOptions.
packages/typescript/ai/skills/ai-core/media-generation/SKILL.md-4-9 (1)

4-9: ⚠️ Potential issue | 🟡 Minor

Keep the all-media lists in sync with generateAudio.

The frontmatter now includes audio, but the overview and streaming-support list still omit it.

📝 Proposed doc sync
-All media activities (image, speech, transcription, video) follow the same
+All media activities (image, audio, speech, transcription, video) follow the same
 server/client architecture: a `generate*()` function on the server, an SSE
 transport via `toServerSentEventsResponse()`, and a framework hook on the
 client.
@@
-activities (`generateImage`, `generateSpeech`, `generateTranscription`,
-`generateVideo`, `summarize`) support `stream: true`, but custom
+activities (`generateImage`, `generateAudio`, `generateSpeech`,
+`generateTranscription`, `generateVideo`, `summarize`) support `stream: true`, but custom
 `useGeneration` setups may not.

Also applies to: 28-31, 487-494

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/skills/ai-core/media-generation/SKILL.md` around lines
4 - 9, The documentation frontmatter added audio but the overview and
streaming-support lists in SKILL.md were not updated; update the all-media lists
and streaming-support sections to include audio and the corresponding
adapter/hook names (generateAudio, geminiAudio/falAudio, and useGenerateAudio)
wherever image/video/speech/transcription are enumerated (including the sections
currently listing generateImage/generateSpeech/generateTranscription and the
streaming-support list), so the lists consistently mention audio and its
adapters/hooks throughout the document.
packages/typescript/ai/skills/ai-core/media-generation/SKILL.md-191-195 (1)

191-195: ⚠️ Potential issue | 🟡 Minor

Broaden the generateAudio() description beyond non-speech audio.

falAudio also covers transformations/enhancement/separation and other audio workflows, so describing it as only music/SFX and “non-speech audio content” is too narrow.

📝 Suggested wording
-### 2. Audio Generation (Music, Sound Effects)
+### 2. Audio Generation and Transformation
 
-Distinct from TTS — `generateAudio()` produces non-speech audio content.
+Distinct from TTS — `generateAudio()` produces or transforms audio content,
+including music, sound effects, audio-to-audio, enhancement, separation, and
+other provider-specific audio workflows.
 Supported adapters: `geminiAudio` (Lyria 3 Pro / Lyria 3 Clip) and
 `falAudio` (MiniMax Music, DiffRhythm, Stable Audio, ElevenLabs SFX, etc.).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/skills/ai-core/media-generation/SKILL.md` around lines
191 - 195, Update the SKILL.md text for generateAudio() so it no longer limits
the function to only “non-speech” music/SFX; expand the description to state
generateAudio() supports music, sound effects, transformations, enhancement,
separation and other audio workflows; mention both adapters geminiAudio and
falAudio and note that falAudio covers generation plus
transformation/enhancement/separation/other audio processing workflows rather
than only MiniMax Music/DiffRhythm/SFX so readers understand the broader
capabilities of generateAudio() and falAudio.
packages/typescript/ai/skills/ai-core/media-generation/SKILL.md-211-222 (1)

211-222: ⚠️ Potential issue | 🟡 Minor

Show a base64 fallback in the audio hook example.

The docs say providers may return url or b64Json, but the playback example only works for URL results.

📝 Suggested playback example
-// Trigger: generate({ prompt: 'Upbeat synths', duration: 10 })
-// Play:    <audio src={result.audio.url} controls />
+const audioSrc =
+  result?.audio.url ??
+  (result?.audio.b64Json
+    ? `data:${result.audio.contentType ?? 'audio/mpeg'};base64,${result.audio.b64Json}`
+    : undefined)
+
+// Trigger: generate({ prompt: 'Upbeat synths', duration: 10 })
+// Play:    {audioSrc && <audio src={audioSrc} controls />}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/skills/ai-core/media-generation/SKILL.md` around lines
211 - 222, Update the client example for useGenerateAudio/fetchServerSentEvents
to handle provider responses that return base64 instead of a URL by adding a
fallback that checks result.audio.b64Json and converts it to a data URL (e.g.,
"data:audio/<format>;base64,<b64>") for the audio src when result.audio.url is
absent; reference the useGenerateAudio hook, fetchServerSentEvents connection,
and the result.audio properties (url and b64Json) so readers know where to
implement the fallback.
examples/ts-react-chat/src/routes/api.generate.speech.ts-10-18 (1)

10-18: ⚠️ Potential issue | 🟡 Minor

Validate body.data before destructuring it.

A malformed or direct request without { data: { text } } will throw during destructuring and return a 500. Return a 400 before building the adapter.

🛡️ Proposed guard
         const body = await request.json()
-        const { text, voice, format, provider } = body.data as {
+        const data = body?.data
+
+        if (!data || typeof data.text !== 'string' || !data.text.trim()) {
+          return new Response(JSON.stringify({ error: 'text is required' }), {
+            status: 400,
+            headers: { 'content-type': 'application/json' },
+          })
+        }
+
+        const { text, voice, format, provider } = data as {
           text: string
           voice?: string
           format?: 'mp3' | 'opus' | 'aac' | 'flac' | 'wav' | 'pcm'
           provider?: SpeechProviderId
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/api.generate.speech.ts` around lines 10 -
18, The handler currently destructures body.data without checking it, causing a
crash on malformed requests; before destructuring and before calling
buildSpeechAdapter, validate that body and body.data exist and that
body.data.text is a non-empty string, and if not respond with a 400 error (bad
request) and a clear message; update the code around the request.json() usage
and the destructuring of { text, voice, format, provider } to perform this guard
and only call buildSpeechAdapter(provider ?? 'openai') when validation passes.
examples/ts-react-chat/src/routes/api.generate.audio.ts-10-18 (1)

10-18: ⚠️ Potential issue | 🟡 Minor

Guard malformed audio-generation requests.

body.data is assumed to exist, and duration is assumed to be a finite number. Invalid requests should fail with 400 instead of throwing or reaching the provider.

🛡️ Proposed guard
         const body = await request.json()
-        const { prompt, duration, provider, model } = body.data as {
+        const data = body?.data
+
+        if (
+          !data ||
+          typeof data.prompt !== 'string' ||
+          !data.prompt.trim() ||
+          (data.duration !== undefined &&
+            (typeof data.duration !== 'number' || !Number.isFinite(data.duration)))
+        ) {
+          return new Response(JSON.stringify({ error: 'valid prompt is required' }), {
+            status: 400,
+            headers: { 'content-type': 'application/json' },
+          })
+        }
+
+        const { prompt, duration, provider, model } = data as {
           prompt: string
           duration?: number
           provider?: AudioProviderId
           model?: string
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/api.generate.audio.ts` around lines 10 -
18, Guard the incoming request before calling buildAudioAdapter: validate that
body and body.data exist and that data.prompt is a non-empty string; if
data.duration is provided ensure Number.isFinite(data.duration) and it is within
acceptable bounds; validate optional data.provider and data.model types; if any
check fails return a 400 response immediately (do not call buildAudioAdapter).
Perform these checks around where request.json() is read and before the line
that calls buildAudioAdapter(provider ?? 'gemini-lyria', model), and include
clear error messages in the 400 responses.
.changeset/audio-example-pages.md-4-15 (1)

4-15: ⚠️ Potential issue | 🟡 Minor

Remove the E2E coverage claim from the changeset.

The PR notes say audio-gen E2E scaffolding was removed and E2E coverage is still pending, but this changeset says E2E coverage/specs were added.

📝 Proposed wording fix
-chore: add ts-react-chat example pages and E2E coverage for audio providers
+chore: add ts-react-chat example pages for audio providers
@@
-- Added a new `audio-gen` feature to the E2E harness with a Gemini Lyria adapter factory, route, UI, fixture, and spec
+- E2E audio generation coverage remains pending aimock AUDIO-modality support
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.changeset/audio-example-pages.md around lines 4 - 15, Update the changeset
to remove the incorrect claim of added E2E coverage: edit the
.changeset/audio-example-pages.md entry to delete or reword the bullets that
state E2E tests/specs were added (references to "audio-gen", "E2E harness",
"route, UI, fixture, and spec", and "E2E coverage") and instead note that E2E
scaffolding was removed and E2E coverage is pending; ensure the remaining
bullets about example pages and unit tests (Gemini TTS adapter items) remain
accurate.
docs/media/audio-generation.md-30-36 (1)

30-36: ⚠️ Potential issue | 🟡 Minor

Fix the Gemini example to use b64Json.

The Gemini example logs result.audio.url, but this page later says Gemini returns base64 in result.audio.b64Json. That contradiction will send users down the wrong path.

Proposed fix
 const result = await generateAudio({
   adapter: geminiAudio('lyria-3-pro-preview'),
   prompt: 'Uplifting indie pop with layered vocals and jangly guitars',
 })
 
-console.log(result.audio.url) // URL to the generated audio file
+console.log(result.audio.b64Json) // Base64-encoded generated audio bytes
 console.log(result.audio.contentType) // e.g. "audio/mpeg"

Also applies to: 121-126

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/media/audio-generation.md` around lines 30 - 36, The Gemini audio
example is incorrect: Gemini returns base64 in result.audio.b64Json, not a URL.
Update the example that calls generateAudio with adapter:
geminiAudio('lyria-3-pro-preview') to read and log result.audio.b64Json (and/or
show how to decode/save it) instead of result.audio.url and
result.audio.contentType; also make the same replacement for the later duplicate
example that currently logs result.audio.url. Target the generateAudio call and
any uses of result.audio.url in this file and replace them with
result.audio.b64Json handling.
docs/media/audio-generation.md-107-118 (1)

107-118: ⚠️ Potential issue | 🟡 Minor

Document that url and b64Json are mutually exclusive.

The PR changes GeneratedAudio to require exactly one source, but this shape shows both as independently optional. Use a union so the docs match the API contract.

Proposed fix
 interface AudioGenerationResult {
   id: string
   model: string
-  audio: {
-    url?: string
-    b64Json?: string
-    contentType?: string
-    duration?: number
-  }
+  audio:
+    | {
+        url: string
+        b64Json?: never
+        contentType?: string
+        duration?: number
+      }
+    | {
+        url?: never
+        b64Json: string
+        contentType?: string
+        duration?: number
+      }
   usage?: { inputTokens?: number; outputTokens?: number; totalTokens?: number }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/media/audio-generation.md` around lines 107 - 118, The
AudioGenerationResult interface currently allows both audio.url and
audio.b64Json to be optional independently; update the audio property to be a
discriminated union so exactly one source is required to match the
GeneratedAudio API contract — for example replace the current audio object with
a union of two shapes (one with url: string and optional contentType/duration,
and the other with b64Json: string and optional contentType/duration) so callers
cannot provide both or neither; ensure the interface name AudioGenerationResult
and its audio property are the ones changed to reflect this mutual exclusivity.
packages/typescript/ai-gemini/tests/audio-adapter.test.ts-35-42 (1)

35-42: ⚠️ Potential issue | 🟡 Minor

Restore the previous GOOGLE_API_KEY value.

This test deletes the env var unconditionally, which can clobber a pre-existing value for later tests in the same worker.

Proposed fix
 it('geminiAudio reads the API key from the environment', () => {
+  const previousGoogleApiKey = process.env.GOOGLE_API_KEY
   process.env.GOOGLE_API_KEY = 'env-key'
   try {
     const adapter = geminiAudio('lyria-3-clip-preview')
     expect(adapter.model).toBe('lyria-3-clip-preview')
   } finally {
-    delete process.env.GOOGLE_API_KEY
+    if (previousGoogleApiKey === undefined) {
+      delete process.env.GOOGLE_API_KEY
+    } else {
+      process.env.GOOGLE_API_KEY = previousGoogleApiKey
+    }
   }
 })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/tests/audio-adapter.test.ts` around lines 35 -
42, The test for geminiAudio that manipulates process.env.GOOGLE_API_KEY should
save the existing value before setting it and restore it in the finally block
instead of unconditionally deleting it; modify the test around
geminiAudio('lyria-3-clip-preview') to capture const prev =
process.env.GOOGLE_API_KEY, set process.env.GOOGLE_API_KEY = 'env-key', and in
finally restore process.env.GOOGLE_API_KEY = prev (or delete only if prev is
undefined) so other tests are not affected.
examples/ts-react-chat/src/routes/generations.transcription.tsx-1-11 (1)

1-11: ⚠️ Potential issue | 🟡 Minor

Fix the import-order/type-import lint failures.

Static analysis reports import/order and import/consistent-type-specifier-style errors here; split the inline type specifiers into top-level type-only imports in the configured order.

Proposed fix
 import { useMemo, useRef, useState } from 'react'
 import { createFileRoute } from '@tanstack/react-router'
 import { useTranscription } from '@tanstack/ai-react'
-import type { UseTranscriptionReturn } from '@tanstack/ai-react'
 import { fetchServerSentEvents } from '@tanstack/ai-client'
 import { transcribeFn, transcribeStreamFn } from '../lib/server-fns'
-import {
-  TRANSCRIPTION_PROVIDERS,
-  type TranscriptionProviderConfig,
-  type TranscriptionProviderId,
-} from '../lib/audio-providers'
+import { TRANSCRIPTION_PROVIDERS } from '../lib/audio-providers'
+import type { UseTranscriptionReturn } from '@tanstack/ai-react'
+import type {
+  TranscriptionProviderConfig,
+  TranscriptionProviderId,
+} from '../lib/audio-providers'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/generations.transcription.tsx` around lines
1 - 11, Split the inline type specifiers into top-level type-only imports and
reorder imports to satisfy import/order: change "import { useTranscription }
from '@tanstack/ai-react' import type { UseTranscriptionReturn } from
'@tanstack/ai-react'" (and any other inline "type" usages such as
TranscriptionProviderConfig and TranscriptionProviderId) so that all runtime
imports come first and all "import type" statements appear after them as
separate lines; ensure you replace inline type references in the existing import
lists with the runtime-only imports (e.g., keep useTranscription,
fetchServerSentEvents, transcribeFn, transcribeStreamFn, TRANSCRIPTION_PROVIDERS
as runtime imports) and add top-level "import type { UseTranscriptionReturn }
from '@tanstack/ai-react'" and "import type { TranscriptionProviderConfig,
TranscriptionProviderId } from '../lib/audio-providers'" to satisfy
import/consistent-type-specifier-style and import/order.
examples/ts-react-chat/src/routes/generations.speech.tsx-1-11 (1)

1-11: ⚠️ Potential issue | 🟡 Minor

Fix the import-order/type-import lint failures.

Static analysis reports import/order and import/consistent-type-specifier-style errors here; move the @tanstack/ai-react type import into a top-level type-only import in the configured order.

Proposed fix
 import { useMemo, useState } from 'react'
 import { createFileRoute } from '@tanstack/react-router'
 import { useGenerateSpeech } from '@tanstack/ai-react'
-import type { UseGenerateSpeechReturn } from '@tanstack/ai-react'
 import { fetchServerSentEvents } from '@tanstack/ai-client'
 import { generateSpeechFn, generateSpeechStreamFn } from '../lib/server-fns'
-import {
-  SPEECH_PROVIDERS,
-  type SpeechProviderConfig,
-  type SpeechProviderId,
-} from '../lib/audio-providers'
+import { SPEECH_PROVIDERS } from '../lib/audio-providers'
+import type { UseGenerateSpeechReturn } from '@tanstack/ai-react'
+import type {
+  SpeechProviderConfig,
+  SpeechProviderId,
+} from '../lib/audio-providers'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/generations.speech.tsx` around lines 1 -
11, The imports violate import/order and type-only style: move the type import
for UseGenerateSpeechReturn into a top-level type-only import and reorder
imports to match configured order (third-party packages first, then local
files); specifically, change the import of UseGenerateSpeechReturn to a separate
"import type { UseGenerateSpeechReturn } from '@tanstack/ai-react'" placed
adjacent to the other `@tanstack` imports and keep useGenerateSpeech as a value
import from '@tanstack/ai-react' so functions like useGenerateSpeech and
fetchServerSentEvents remain value imports while UseGenerateSpeechReturn is a
type-only import.
examples/ts-react-chat/src/routes/generations.audio.tsx-1-12 (1)

1-12: ⚠️ Potential issue | 🟡 Minor

Fix the lint-blocking import order and type-only import style.

The static analysis errors here will fail lint. Split inline type specifiers into top-level type imports and place type imports after value imports.

🧹 Proposed lint fix
 import { useMemo, useState } from 'react'
-import { createFileRoute } from '@tanstack/react-router'
-import { useGenerateAudio } from '@tanstack/ai-react'
-import type { UseGenerateAudioReturn } from '@tanstack/ai-react'
 import { fetchServerSentEvents } from '@tanstack/ai-client'
-import type { AudioGenerationResult } from '@tanstack/ai'
+import { useGenerateAudio } from '@tanstack/ai-react'
+import { createFileRoute } from '@tanstack/react-router'
 import { generateAudioFn } from '../lib/server-fns'
-import {
-  AUDIO_PROVIDERS,
-  type AudioProviderConfig,
-  type AudioProviderId,
-} from '../lib/audio-providers'
+import { AUDIO_PROVIDERS } from '../lib/audio-providers'
+import type { AudioGenerationResult } from '@tanstack/ai'
+import type { UseGenerateAudioReturn } from '@tanstack/ai-react'
+import type {
+  AudioProviderConfig,
+  AudioProviderId,
+} from '../lib/audio-providers'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/generations.audio.tsx` around lines 1 - 12,
Imports mix value and type specifiers and are misordered, causing lint failures;
fix by separating type-only imports into top-level "import type" statements
placed after the value imports and reorder imports to follow project convention
(external packages first, then local modules). Specifically, keep value imports
like useMemo, useState, createFileRoute, useGenerateAudio,
fetchServerSentEvents, generateAudioFn, and AUDIO_PROVIDERS as regular imports,
then add top-level "import type" lines for UseGenerateAudioReturn,
AudioGenerationResult, AudioProviderConfig, and AudioProviderId (referencing
those exact symbols) placed after the value imports so the linter accepts the
style and ordering.
packages/typescript/ai-fal/tests/speech-adapter.test.ts-20-26 (1)

20-26: ⚠️ Potential issue | 🟡 Minor

Move the fetch stub into beforeEach and restore it in afterEach.

The global stub on line 26 persists across tests without explicit cleanup, risking test isolation issues. Use beforeEach to stub and afterEach with vi.unstubAllGlobals() to ensure cleanup between tests, matching the pattern used elsewhere in the codebase.

Proposed fix
-import { beforeEach, describe, expect, it, vi } from 'vitest'
+import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'

 const mockFetchResponse = {
   arrayBuffer: () =>
     Promise.resolve(new Uint8Array([72, 101, 108, 108, 111]).buffer),
 }
-const mockFetch = vi.fn().mockResolvedValue(mockFetchResponse)
-vi.stubGlobal('fetch', mockFetch)

 beforeEach(() => {
   vi.clearAllMocks()
+  const mockFetch = vi.fn().mockResolvedValue(mockFetchResponse)
+  vi.stubGlobal('fetch', mockFetch)
 })
+
+afterEach(() => {
+  vi.unstubAllGlobals()
+})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-fal/tests/speech-adapter.test.ts` around lines 20 -
26, The global fetch stub defined as mockFetch and installed with
vi.stubGlobal('fetch', mockFetch) should be moved into a beforeEach block and
cleaned up in an afterEach block; update tests in speech-adapter.test.ts to
create the mockFetch and call vi.stubGlobal('fetch', mockFetch) inside
beforeEach and call vi.unstubAllGlobals() (or restore the stub) in afterEach to
ensure isolation between tests and match the project's test patterns.
examples/ts-react-chat/src/routes/generations.audio.tsx-33-44 (1)

33-44: ⚠️ Potential issue | 🟡 Minor

Revoke Blob URLs created for base64 audio results.

URL.createObjectURL(blob) retains the audio buffer in memory until revoked. Repeated generations leak large audio blobs since previous blob URLs are never cleaned up when the result changes, is reset, or the component unmounts.

Add a cleanup effect to revoke object URLs:

Proposed solution
-import { useMemo, useState } from 'react'
+import { useEffect, useMemo, useState } from 'react'
@@
 function AudioGenerationUI({
@@
 }) {
+  useEffect(() => {
+    if (!result?.url.startsWith('blob:')) return
+
+    return () => {
+      URL.revokeObjectURL(result.url)
+    }
+  }, [result?.url])
+
   const handleGenerate = () => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/generations.audio.tsx` around lines 33 -
44, The code creates blob URLs via URL.createObjectURL(blob) for base64 audio
(when audio.b64Json) but never revokes them, leaking memory; update the
component that uses this generation to track the produced url (the value
returned alongside contentType) and revoke previous blob URLs by calling
URL.revokeObjectURL(oldUrl) before assigning a new one and again on component
unmount (use a React useEffect cleanup) so all Blob URLs produced by
URL.createObjectURL are properly released; ensure any reset path also revokes
the current url.
examples/ts-react-chat/src/lib/server-audio-adapters.ts-75-86 (1)

75-86: ⚠️ Potential issue | 🟡 Minor

Minor: override silently rejected when models list is absent.

When config.models is undefined, config.models?.some(...) evaluates to undefined (falsy), so every override gets the "rejected" warning and falls back. If providers without an explicit models array are intended to disallow overrides at all, consider making that explicit in the warning (or only warning when a models list exists and the override isn't in it) to avoid noisy logs for a legitimate config shape.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/lib/server-audio-adapters.ts` around lines 75 -
86, The resolveModel function currently warns on every override when
config.models is undefined; change its logic so that if config.models is
undefined it simply returns config.model (no warning), and only perform the
.some(...) check and emit the console.warn when config.models exists; update
resolveModel to check for config.models presence first, then validate
modelOverride against config.models and warn only when the list exists and the
override isn't found.
packages/typescript/ai-fal/src/adapters/transcription.ts-63-78 (1)

63-78: ⚠️ Potential issue | 🟡 Minor

options.audio as base64 string won't work here.

TranscriptionOptions.audio is typed as string | File | Blob | ArrayBuffer, where a bare string is commonly a base64 payload (matching other providers like OpenAI). Here it is passed straight through as audio_url, so fal will try to fetch it as a URL. If users pass base64, this will fail with an opaque fal error.

Consider either wrapping a detected base64/data-URI string into a Blob before assignment, or at minimum documenting on falTranscription that audio must be a URL string (or Blob/File/ArrayBuffer).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-fal/src/adapters/transcription.ts` around lines 63 -
78, buildInput currently passes a bare string from TranscriptionOptions.audio
straight to FalModelInput.audio_url which treats it as a URL and breaks for
base64/data-URI payloads; update buildInput to detect when options.audio is a
string that is a data URI or base64 payload and convert it into a Blob (decode
base64, create Uint8Array, then new Blob([...], { type: inferredMime })) before
assigning audio_url, keeping the existing handling for File/Blob/ArrayBuffer;
alternatively, if you prefer not to convert, update the falTranscription
docs/type to clearly state that string must be a URL and validate/throw a
helpful error in buildInput when a non-URL string is provided.
packages/typescript/ai-gemini/src/adapters/tts.ts-203-213 (1)

203-213: ⚠️ Potential issue | 🟡 Minor

parsePcmMimeType extracts channels but it's never forwarded to the WAV wrapper.

wrapPcmBase64AsWav falls back to channels = 1, so a stereo audio/L16;rate=...;channels=2 response would produce a WAV header claiming mono — the file would play at double speed / corrupted. Gemini TTS is mono today, but since the parser already supports a channels hint, forward it to avoid a latent bug.

🐛 Proposed fix
     const pcm = parsePcmMimeType(mimeType)
     if (pcm) {
-      const wavBase64 = wrapPcmBase64AsWav(audioBase64, pcm.sampleRate)
+      const wavBase64 = wrapPcmBase64AsWav(
+        audioBase64,
+        pcm.sampleRate,
+        pcm.channels,
+      )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/adapters/tts.ts` around lines 203 - 213,
The pcm channels parsed by parsePcmMimeType are not passed into
wrapPcmBase64AsWav, so WAV headers default to mono and stereo L16 responses will
be corrupted; modify the branch that handles pcm (in the function producing the
returned object) to pass pcm.channels (or a validated default) into
wrapPcmBase64AsWav instead of relying on its internal fallback, e.g., call
wrapPcmBase64AsWav(audioBase64, pcm.sampleRate, pcm.channels) and ensure
wrapPcmBase64AsWav accepts and uses that channels parameter.
packages/typescript/ai/src/activities/generateAudio/index.ts-133-154 (1)

133-154: ⚠️ Potential issue | 🟡 Minor

duration means two different things across started vs completed events.

On audio:request:started, duration is the requested audio length (seconds, from rest.duration). On audio:request:completed, duration is the elapsed request time in milliseconds (elapsedMs). Reusing the same key for audio-length vs wall-clock elapsed time is a footgun for devtools consumers (and breaks symmetry with the request-scoped fields).

Consider renaming the completed-event field — e.g., elapsedMs or latencyMs — to disambiguate. This is a new event surface, so it's the cheapest time to fix.

♻️ Suggested rename
   aiEventClient.emit('audio:request:completed', {
     requestId,
     provider: adapter.name,
     model,
     audio: result.audio,
-    duration: elapsedMs,
+    elapsedMs,
     modelOptions: rest.modelOptions as Record<string, unknown> | undefined,
     timestamp: Date.now(),
   })

(Update the matching event type in @tanstack/ai-event-client accordingly.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/src/activities/generateAudio/index.ts` around lines
133 - 154, The completed event reuses the key duration for elapsed wall-clock
time which conflicts with the requested audio length sent in the started event;
change the completed payload to use a distinct field name (e.g., elapsedMs or
latencyMs) instead of duration in the aiEventClient.emit call after
adapter.generateAudio (reference: requestId, adapter.name, model, result.audio,
elapsedMs, rest.duration), and update the corresponding event type in
`@tanstack/ai-event-client` so consumers expect the new field name.
packages/typescript/ai-gemini/src/adapters/audio.ts-96-115 (1)

96-115: ⚠️ Potential issue | 🟡 Minor

Use the standard audio/mpeg MIME and note the dead fallback.

Two small issues in the audio extraction:

  1. 'audio/mp3' is not a registered IANA media type — the standard is 'audio/mpeg'. Some stricter clients (and MediaSource.isTypeSupported) only accept the standard form.
  2. The ?? 'audio/mp3' fallback is unreachable: audioPart is the element found via part.inlineData?.mimeType?.startsWith('audio/'), so mimeType is guaranteed to be a string here.
♻️ Proposed fix
-    const contentType = audioPart.inlineData.mimeType ?? 'audio/mp3'
+    const contentType = audioPart.inlineData.mimeType ?? 'audio/mpeg'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/adapters/audio.ts` around lines 96 - 115,
The MIME fallback and selection should use the standard "audio/mpeg" and avoid
the unreachable "?? 'audio/mp3'": in the audio extraction code (look for
audioPart and its inlineData.mimeType and contentType), set contentType to the
actual mime string from audioPart.inlineData.mimeType and if you want a
defensive default use 'audio/mpeg' (e.g. contentType =
audioPart.inlineData.mimeType || 'audio/mpeg'), and replace the old 'audio/mp3'
token and remove the unreachable nullish-coalescing fallback.
🧹 Nitpick comments (6)
packages/typescript/ai-fal/src/utils/client.ts (1)

73-78: Avoid byte-by-byte string concatenation for large audio payloads.

The browser fallback can become allocation-heavy for multi-MB audio buffers. Build chunks and join once before btoa.

♻️ Proposed browser fallback
   const view = new Uint8Array(bytes)
-  let binary = ''
-  for (let i = 0; i < view.byteLength; i += 1) {
-    binary += String.fromCharCode(view[i]!)
+  const chunkSize = 0x8000
+  const chunks: Array<string> = []
+  for (let i = 0; i < view.byteLength; i += chunkSize) {
+    let chunk = ''
+    const end = Math.min(i + chunkSize, view.byteLength)
+    for (let j = i; j < end; j += 1) {
+      chunk += String.fromCharCode(view[j]!)
+    }
+    chunks.push(chunk)
   }
-  return btoa(binary)
+  return btoa(chunks.join(''))
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-fal/src/utils/client.ts` around lines 73 - 78, The
current browser fallback converts a Uint8Array to base64 via byte-by-byte string
concatenation (variables view, binary and final btoa call), which is
allocation-heavy for large audio; replace the loop with a chunked approach:
iterate the view in slices (e.g., 16–32KB), push String.fromCharCode applied to
each slice into an array, join the array once to form the binary string, then
call btoa on the joined string — update the code around view/binary/btoa to use
this chunked buffer-join pattern to reduce allocations.
packages/typescript/ai-vue/tests/use-generation.test.ts (1)

440-492: Add error and reset parity coverage for useGenerateAudio.

The new suite covers initial/fetcher/connection paths, but unlike the adjacent media hooks it does not exercise errors or reset/stop exposure.

🧪 Suggested additional cases
   it('should generate audio using connection', async () => {
     const chunks = createGenerationChunks(mockAudioResult)
     const adapter = createMockConnectionAdapter({ chunks })
@@
     expect(result.result.value).toEqual(mockAudioResult)
     expect(result.status.value).toBe('success')
   })
+
+  it('should handle errors', async () => {
+    const onError = vi.fn()
+
+    const { result } = renderHook(() =>
+      useGenerateAudio({
+        fetcher: async () => {
+          throw new Error('Audio generation failed')
+        },
+        onError,
+      }),
+    )
+
+    await result.generate({ prompt: 'test' })
+    await flushPromises()
+    await nextTick()
+
+    expect(result.status.value).toBe('error')
+    expect(result.error.value?.message).toBe('Audio generation failed')
+    expect(onError).toHaveBeenCalled()
+  })
+
+  it('should expose stop and reset', () => {
+    const adapter = createMockConnectionAdapter()
+    const { result } = renderHook(() =>
+      useGenerateAudio({ connection: adapter }),
+    )
+
+    expect(typeof result.stop).toBe('function')
+    expect(typeof result.reset).toBe('function')
+  })
 })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-vue/tests/use-generation.test.ts` around lines 440 -
492, Add tests to cover error and reset/stop parity for useGenerateAudio: add a
test that supplies a fetcher which throws (or rejects) and assert result.value
becomes null and status.value becomes 'error' after awaiting generate; add a
test that simulates a connection that emits an error chunk (use
createMockConnectionAdapter with chunks containing an error event) and assert
the same error state; add a test that calls result.reset() after a successful
generation and assert result.value is cleared and status.value returns to
'idle'; and add a test that calls result.stop() during a streaming generation
and assert generation is aborted and status.value becomes 'idle' or 'stopped' as
the hook defines. Reference useGenerateAudio, generate, reset, stop,
createMockConnectionAdapter and createGenerationChunks to locate where to add
these cases.
packages/typescript/ai-gemini/src/model-meta.ts (1)

511-549: Preserve model-specific provider option typing for Lyria models.

Both Lyria entries use bare ModelMeta, so their provider option type falls back to unknown even though the audio adapter accepts typed modelOptions like responseMimeType and seed. Please wire these models into the Gemini audio provider-options/type map rather than leaving them untyped. Based on learnings, “Provide type safety per model by using model-specific provider options in adapter packages.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/model-meta.ts` around lines 511 - 549,
LYRIA_3_PRO and LYRIA_3_CLIP are declared as plain ModelMeta which causes their
provider option type to be unknown; update these constants to use the same
model-specific provider-options typing used for other Gemini audio models and
register them in the Gemini audio provider-options/type map so the audio adapter
recognizes typed options like responseMimeType and seed. Locate LYRIA_3_PRO and
LYRIA_3_CLIP in model-meta.ts and replace their bare ModelMeta assertion with
the concrete typed meta (the same pattern used by existing Gemini audio models),
then add entries mapping 'lyria-3-pro-preview' and 'lyria-3-clip-preview' to the
Gemini audio provider-options/type map so callers of the audio adapter receive
the correct typed modelOptions.
docs/adapters/fal.md (1)

244-244: Consider pinning the version caveat less tightly.

Calling out @fal-ai/client@1.9.1 by exact version will go stale quickly as fal's SDK adds types for newer models. Consider phrasing as "current @fal-ai/client type map" so the doc doesn't need to be edited on every SDK bump.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/adapters/fal.md` at line 244, Update the note that currently references
the exact package version string '@fal-ai/client@1.9.1' to a looser, time-proof
phrasing (e.g., "the current `@fal-ai/client` type map" or "the installed
`@fal-ai/client` type map") so it won't go stale; keep the rest of the sentence
about modelOptions not autocompleting and that the fal adapter accepts any model
ID as a string (references: the literal '@fal-ai/client@1.9.1', the term
'modelOptions', and 'fal adapter').
packages/typescript/ai/src/activities/generateAudio/index.ts (1)

79-81: Minor: createId is duplicated across media activities.

The same helper exists in generateTranscription, generateSpeech, and now generateAudio. Consider lifting it into a shared util (e.g. alongside streamGenerationResult) to avoid drift if the format ever changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/src/activities/generateAudio/index.ts` around lines 79
- 81, The createId helper in generateAudio (function createId) is duplicated
across media activities (generateTranscription, generateSpeech, generateAudio);
extract it into a shared util (e.g., next to streamGenerationResult) and replace
the local createId definitions with an import. Move the implementation
`${prefix}-${Date.now()}-${Math.random().toString(36).slice(2,9)}` into the new
util, export a named createId function, update imports in the affected files to
use the shared createId, and remove the duplicate functions to keep a single
canonical generator.
packages/typescript/ai-fal/src/adapters/audio.ts (1)

99-113: audio_url branch drops contentType — consider inferring it.

When the provider returns { audio_url: string }, contentType stays undefined even though the URL extension or a Content-Type could be used. Not a blocker, but downstream consumers that key off contentType (players, downloads) will degrade on these models.

♻️ Optional: infer contentType from URL extension
     } else if (typeof data.audio_url === 'string') {
       audioUrl = data.audio_url
+      const ext = audioUrl.split('?')[0]?.split('.').pop()?.toLowerCase()
+      if (ext === 'mp3') contentType = 'audio/mpeg'
+      else if (ext === 'wav') contentType = 'audio/wav'
+      else if (ext === 'ogg') contentType = 'audio/ogg'
+      else if (ext === 'flac') contentType = 'audio/flac'
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-fal/src/adapters/audio.ts` around lines 99 - 113, When
handling the `{ audio_url: string }` branch in adapters/audio.ts (the code that
sets audioUrl and contentType), add logic to infer and set contentType if it
remains undefined by mapping the audioUrl file extension to a MIME type (e.g.,
.mp3 -> audio/mpeg, .wav -> audio/wav, .ogg -> audio/ogg, .m4a/.mp4 ->
audio/mp4) after assigning audioUrl; update the code that currently assigns
audioUrl in the else-if branch to also attempt this extension-to-MIME mapping
and set the contentType variable so downstream consumers using contentType (the
variables audioUrl and contentType) will have a value when only audio_url is
returned.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b758fb0a-8a6b-44c5-a066-5bb44b32389a

📥 Commits

Reviewing files that changed from the base of the PR and between 3dcc474 and 8af703f.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (72)
  • .changeset/audio-activity.md
  • .changeset/audio-example-pages.md
  • .changeset/audio-generation-hook.md
  • .changeset/fal-audio-speech-transcription.md
  • .changeset/gemini-audio.md
  • .changeset/generated-media-union.md
  • docs/adapters/fal.md
  • docs/config.json
  • docs/media/audio-generation.md
  • docs/media/generation-hooks.md
  • docs/media/text-to-speech.md
  • docs/media/transcription.md
  • examples/ts-react-chat/package.json
  • examples/ts-react-chat/src/components/Header.tsx
  • examples/ts-react-chat/src/lib/audio-providers.ts
  • examples/ts-react-chat/src/lib/server-audio-adapters.ts
  • examples/ts-react-chat/src/lib/server-fns.ts
  • examples/ts-react-chat/src/routeTree.gen.ts
  • examples/ts-react-chat/src/routes/api.generate.audio.ts
  • examples/ts-react-chat/src/routes/api.generate.speech.ts
  • examples/ts-react-chat/src/routes/api.transcribe.ts
  • examples/ts-react-chat/src/routes/generations.audio.tsx
  • examples/ts-react-chat/src/routes/generations.speech.tsx
  • examples/ts-react-chat/src/routes/generations.transcription.tsx
  • examples/ts-react-chat/src/routes/index.tsx
  • packages/typescript/ai-client/src/generation-types.ts
  • packages/typescript/ai-client/src/index.ts
  • packages/typescript/ai-event-client/src/index.ts
  • packages/typescript/ai-fal/package.json
  • packages/typescript/ai-fal/src/adapters/audio.ts
  • packages/typescript/ai-fal/src/adapters/image.ts
  • packages/typescript/ai-fal/src/adapters/speech.ts
  • packages/typescript/ai-fal/src/adapters/transcription.ts
  • packages/typescript/ai-fal/src/index.ts
  • packages/typescript/ai-fal/src/model-meta.ts
  • packages/typescript/ai-fal/src/utils/client.ts
  • packages/typescript/ai-fal/src/utils/index.ts
  • packages/typescript/ai-fal/tests/audio-adapter.test.ts
  • packages/typescript/ai-fal/tests/image-adapter.test.ts
  • packages/typescript/ai-fal/tests/speech-adapter.test.ts
  • packages/typescript/ai-fal/tests/transcription-adapter.test.ts
  • packages/typescript/ai-gemini/src/adapters/audio.ts
  • packages/typescript/ai-gemini/src/adapters/image.ts
  • packages/typescript/ai-gemini/src/adapters/tts.ts
  • packages/typescript/ai-gemini/src/index.ts
  • packages/typescript/ai-gemini/src/model-meta.ts
  • packages/typescript/ai-gemini/tests/audio-adapter.test.ts
  • packages/typescript/ai-gemini/tests/tts-adapter.test.ts
  • packages/typescript/ai-grok/src/adapters/image.ts
  • packages/typescript/ai-openai/src/adapters/image.ts
  • packages/typescript/ai-openrouter/src/adapters/image.ts
  • packages/typescript/ai-openrouter/tests/image-adapter.test.ts
  • packages/typescript/ai-react/src/index.ts
  • packages/typescript/ai-react/src/use-generate-audio.ts
  • packages/typescript/ai-react/tests/use-generation.test.ts
  • packages/typescript/ai-solid/src/index.ts
  • packages/typescript/ai-solid/src/use-generate-audio.ts
  • packages/typescript/ai-solid/tests/use-generation.test.ts
  • packages/typescript/ai-svelte/src/create-generate-audio.svelte.ts
  • packages/typescript/ai-svelte/src/index.ts
  • packages/typescript/ai-vue/src/index.ts
  • packages/typescript/ai-vue/src/use-generate-audio.ts
  • packages/typescript/ai-vue/tests/use-generation.test.ts
  • packages/typescript/ai/skills/ai-core/media-generation/SKILL.md
  • packages/typescript/ai/src/activities/generateAudio/adapter.ts
  • packages/typescript/ai/src/activities/generateAudio/index.ts
  • packages/typescript/ai/src/activities/generateSpeech/index.ts
  • packages/typescript/ai/src/activities/generateTranscription/index.ts
  • packages/typescript/ai/src/activities/index.ts
  • packages/typescript/ai/src/index.ts
  • packages/typescript/ai/src/types.ts
  • terminalOutput
💤 Files with no reviewable changes (1)
  • terminalOutput

Comment on lines +65 to +71
return {
fetcher: (input: { text: string; voice?: string }) =>
generateSpeechStreamFn({
data: { ...input, provider: config.id },
}),
onResult: toSpeechOutput,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use SSE connection mode for the streaming server function.

generateSpeechStreamFn returns an SSE Response, but this branch passes it as a fetcher result. That makes onResult: toSpeechOutput receive the wrong shape instead of parsed stream chunks, breaking the “Server Fn” mode.

Proposed fix
     return {
-      fetcher: (input: { text: string; voice?: string }) =>
-        generateSpeechStreamFn({
-          data: { ...input, provider: config.id },
-        }),
+      connection: fetchServerSentEvents(
+        (input: { text: string; voice?: string }) =>
+          generateSpeechStreamFn({
+            data: { ...input, provider: config.id },
+          }),
+      ),
       onResult: toSpeechOutput,
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/generations.speech.tsx` around lines 65 -
71, The fetcher currently returns the SSE Response directly which causes
onResult (toSpeechOutput) to receive the wrong shape; change the fetcher to
return an object with the SSE response and the SSE connection mode so the client
treats it as a streaming Server Fn. Concretely, update the fetcher to return {
response: generateSpeechStreamFn({ data: { ...input, provider: config.id } }),
connection: 'sse' } so generateSpeechStreamFn (the SSE Response) is passed with
connection: 'sse' and onResult: toSpeechOutput will receive parsed stream
chunks.

Comment on lines +41 to +50
return {
fetcher: (input: { audio: string | Blob; language?: string }) =>
transcribeStreamFn({
data: {
audio: input.audio as string,
language: input.language,
provider: config.id,
},
}),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use SSE connection mode for the streaming server function.

transcribeStreamFn returns an SSE Response, but this branch passes it as a fetcher result. The “Server Fn” mode will receive the raw response shape instead of parsed transcription stream chunks.

Proposed fix
     return {
-      fetcher: (input: { audio: string | Blob; language?: string }) =>
-        transcribeStreamFn({
-          data: {
-            audio: input.audio as string,
-            language: input.language,
-            provider: config.id,
-          },
-        }),
+      connection: fetchServerSentEvents(
+        (input: { audio: string | Blob; language?: string }) =>
+          transcribeStreamFn({
+            data: {
+              audio: input.audio as string,
+              language: input.language,
+              provider: config.id,
+            },
+          }),
+      ),
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/generations.transcription.tsx` around lines
41 - 50, The fetcher currently returns transcribeStreamFn(...) which produces a
raw Response; change the call so it requests SSE connection mode (so the fetcher
returns the streaming/evented output, not the raw response). Update the fetcher
invocation of transcribeStreamFn to pass the SSE mode (e.g., add a connection:
"sse" or call the transcribeStreamFn SSE-specific entry point) while keeping the
same data fields (audio, language, provider: config.id) so the client consumes
transcription chunks instead of the raw Response.

Comment on lines +213 to +220
export interface AudioGenerateInput {
/** Text description of the desired audio */
prompt: string
/** Desired duration in seconds */
duration?: number
/** Model-specific options */
modelOptions?: Record<string, any>
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Allow non-text audio workflows in AudioGenerateInput.

Line 215 makes prompt mandatory and there’s no source-audio field, so the new hooks can’t type-safely call audio-to-audio, enhancement, separation, or merge models without dummy prompts or provider-specific tunneling through modelOptions.

🔧 Proposed input shape
+type AudioSourceInput = string | File | Blob | ArrayBuffer
+
 /**
  * Input for audio generation (music, sound effects).
  */
-export interface AudioGenerateInput {
-  /** Text description of the desired audio */
-  prompt: string
+export type AudioGenerateInput =
+  | {
+      /** Text description of the desired audio */
+      prompt: string
+      /** Optional source audio for audio-to-audio capable models */
+      audio?: AudioSourceInput
+      /** Desired duration in seconds */
+      duration?: number
+      /** Model-specific options */
+      modelOptions?: Record<string, any>
+    }
+  | {
+      /** Optional prompt for models that accept both source audio and text guidance */
+      prompt?: string
+      /** Source audio for audio-to-audio, enhancement, separation, or merge models */
+      audio: AudioSourceInput
+      /** Desired duration in seconds */
+      duration?: number
+      /** Model-specific options */
+      modelOptions?: Record<string, any>
+    }
-  /** Desired duration in seconds */
-  duration?: number
-  /** Model-specific options */
-  modelOptions?: Record<string, any>
-}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export interface AudioGenerateInput {
/** Text description of the desired audio */
prompt: string
/** Desired duration in seconds */
duration?: number
/** Model-specific options */
modelOptions?: Record<string, any>
}
type AudioSourceInput = string | File | Blob | ArrayBuffer
/**
* Input for audio generation (music, sound effects).
*/
export type AudioGenerateInput =
| {
/** Text description of the desired audio */
prompt: string
/** Optional source audio for audio-to-audio capable models */
audio?: AudioSourceInput
/** Desired duration in seconds */
duration?: number
/** Model-specific options */
modelOptions?: Record<string, any>
}
| {
/** Optional prompt for models that accept both source audio and text guidance */
prompt?: string
/** Source audio for audio-to-audio, enhancement, separation, or merge models */
audio: AudioSourceInput
/** Desired duration in seconds */
duration?: number
/** Model-specific options */
modelOptions?: Record<string, any>
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-client/src/generation-types.ts` around lines 213 -
220, AudioGenerateInput currently requires prompt and has no source-audio field
which prevents type-safe audio-to-audio/enhancement/separation/merge workflows;
update the AudioGenerateInput interface to support non-text workflows by making
prompt optional (or changing to a discriminated union) and adding a source audio
property (e.g., sourceAudio: string | Uint8Array | {url: string}) plus an
inputType or mode discriminator (e.g., inputType: 'text' | 'audio') so callers
and providers can clearly distinguish text vs audio inputs while keeping
modelOptions for provider-specific overrides.

Comment thread packages/typescript/ai-event-client/src/index.ts
Comment thread packages/typescript/ai-fal/src/adapters/speech.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/typescript/ai-gemini/src/adapters/image.ts (1)

208-256: ⚠️ Potential issue | 🟡 Minor

Empty Imagen response still resolves to images: [] — inconsistent with the new Gemini transformer.

The gate at Line 234 is entries.length > 0 && images.length === 0, so when response.generatedImages is undefined or [], transformImagenResponse silently returns { images: [] }. That reintroduces the exact failure mode the sibling transformGeminiResponse was just fixed to prevent (Line 177 throws on images.length === 0 regardless of parts), and it defeats the comment "forced to handle the failure mode explicitly" right above the check. Callers now get different contracts between Imagen and Gemini paths for the same "no images produced" condition.

🛠️ Proposed fix: throw on any empty result, include filter reasons when present
-    // Every entry was filtered — no usable images to return. Throw rather
-    // than resolve to an empty array so the caller is forced to handle the
-    // failure mode explicitly.
-    if (entries.length > 0 && images.length === 0) {
-      const joined = filterReasons.length > 0 ? filterReasons.join('; ') : ''
-      throw new Error(
-        `Imagen ${model} returned no images: all ${entries.length} generated image(s) were filtered by Responsible-AI${joined ? ` (${joined})` : ''}.`,
-      )
-    }
+    // No usable images to return — throw rather than resolve to an empty
+    // array so callers must handle the failure mode explicitly, matching
+    // the Gemini transformer's contract.
+    if (images.length === 0) {
+      const joined = filterReasons.length > 0 ? filterReasons.join('; ') : ''
+      const detail =
+        entries.length === 0
+          ? ' (no generated image entries were returned).'
+          : `: all ${entries.length} generated image(s) were filtered by Responsible-AI${joined ? ` (${joined})` : ''}.`
+      throw new Error(`Imagen ${model} returned no images${detail}`)
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/adapters/image.ts` around lines 208 - 256,
The function transformImagenResponse currently only throws when entries.length >
0 && images.length === 0 which lets undefined/empty generatedImages resolve to
images: []; change the gate to throw whenever images.length === 0 (regardless of
entries) so callers must handle an empty result; keep collecting filterReasons
and include them in the thrown Error message (use filterReasons.join('; ') when
present) and preserve the existing console.warn behavior for partial filters;
update references in the message to include the model and entries.length for
context (function: transformImagenResponse, vars: entries, images,
filterReasons).
🧹 Nitpick comments (3)
packages/typescript/ai-gemini/tests/model-meta.test.ts (1)

592-600: Consider also asserting the new gemini-3.1-flash-tts-preview entry.

Per the PR objectives and GEMINI_TTS_MODELS in src/model-meta.ts (lines 1006-1010), the registry now includes GEMINI_3_1_FLASH_TTS alongside the 2.5 Pro/Flash entries. The uniqueness assertion covers accidental duplication, but an explicit toContain for the 3.1 Flash TTS name would make this regression test catch a future name regression on that entry too.

🧪 Optional addition
     expect(GEMINI_TTS_MODELS).toContain('gemini-2.5-pro-preview-tts')
     expect(GEMINI_TTS_MODELS).toContain('gemini-2.5-flash-preview-tts')
+    expect(GEMINI_TTS_MODELS).toContain('gemini-3.1-flash-tts-preview')
     expect(new Set(GEMINI_TTS_MODELS).size).toBe(GEMINI_TTS_MODELS.length)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/tests/model-meta.test.ts` around lines 592 -
600, Add an explicit assertion for the new 3.1 Flash TTS entry in the "Gemini
TTS model registry" test: inside the it block that checks GEMINI_TTS_MODELS, add
an expect toContain for the GEMINI_3_1_FLASH_TTS constant (or the literal
'gemini-3.1-flash-tts-preview') alongside the existing checks for
GEMINI_2_5_PRO_TTS and GEMINI_2_5_FLASH_TTS to ensure the 3.1 Flash TTS name is
present and guarded against future regressions.
packages/typescript/ai-fal/src/adapters/audio.ts (1)

98-127: Consider handling the plural audios response shape some fal music models return.

Several fal music endpoints (e.g., MusicGen variants) return { audios: [{ url, content_type }, …] } rather than audio/audio_url. Currently such responses will throw 'Audio URL not found in fal audio generation response'. Picking audios[0] would extend coverage without changing the single-result contract.

♻️ Proposed addition
     if (data.audio && typeof data.audio === 'object' && 'url' in data.audio) {
       const audioObj = data.audio as { url: string; content_type?: string }
       audioUrl = audioObj.url
       contentType = audioObj.content_type
     } else if (typeof data.audio_url === 'string') {
       audioUrl = data.audio_url
+    } else if (Array.isArray(data.audios) && data.audios.length > 0) {
+      const first = data.audios[0] as { url?: string; content_type?: string }
+      audioUrl = first.url
+      contentType = first.content_type
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-fal/src/adapters/audio.ts` around lines 98 - 127,
transformResponse currently only checks data.audio and data.audio_url, so
responses with data.audios (an array) will fail; update transformResponse to
also handle data.audios when it's an array by selecting the first item (const
first = (data.audios as Array<Record<string, unknown>>)[0]) and extracting
first.url and first.content_type into audioUrl/contentType, falling back to
existing audio/audio_url logic; preserve the same return shape (id:
response.requestId || this.generateId(), model: this.model, audio: { url:
audioUrl, contentType: deriveAudioContentType(contentType, audioUrl) }) and keep
throwing the existing error if no URL is found.
packages/typescript/ai-gemini/src/adapters/audio.ts (1)

96-109: Validate the response shape instead of parsing through any.

This parses provider output manually with part: any; use a small Zod schema for the inline audio part so malformed SDK/provider responses fail predictably and align with library validation conventions. As per coding guidelines, packages/typescript/**/src/**/*.ts: Use Zod for schema validation and tool definition across the library.

♻️ Example direction
+const GeminiInlineAudioPartSchema = z.object({
+  inlineData: z.object({
+    mimeType: z.string().startsWith('audio/'),
+    data: z.string().min(1),
+  }),
+})
+
     const parts = response.candidates?.[0]?.content?.parts ?? []
-    const audioPart = parts.find((part: any) =>
-      part.inlineData?.mimeType?.startsWith('audio/'),
+    const audioPartResult = parts
+      .map((part) => GeminiInlineAudioPartSchema.safeParse(part))
+      .find((result) => result.success)
+
+    if (!audioPartResult?.success) {
+      throw new Error('No audio data in Gemini Lyria response')
     )
 
-    if (!audioPart?.inlineData?.data) {
-      throw new Error('No audio data in Gemini Lyria response')
-    }
-
-    // audioPart was selected because mimeType.startsWith('audio/') was
-    // truthy, so the mime type is guaranteed to be a string here. Trust the
-    // value Gemini returned rather than inventing a non-standard
-    // `audio/mp3` fallback (IANA is `audio/mpeg`).
-    const contentType = audioPart.inlineData.mimeType
+    const { inlineData } = audioPartResult.data
+    const contentType = inlineData.mimeType
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/adapters/audio.ts` around lines 96 - 109,
The code extracts audioPart from response.candidates using loose any typing;
replace this with a Zod validation: define a small schema (e.g.,
InlineAudioPartSchema) that validates the shape { inlineData: { mimeType:
string, data: string } } and apply it to
response.candidates?.[0]?.content?.parts (or to the selected part) before using
audioPart.inlineData.mimeType/data; if validation fails, throw a clear error
like "Invalid Gemini Lyria audio part" so malformed provider/SDK responses fail
predictably and follow the repo's Zod validation conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/typescript/ai-fal/src/adapters/speech.ts`:
- Around line 107-120: The computed format (in the block using contentTypeMime,
extractUrlExtension, and audioUrl) can be a MIME subtype like "mpeg" which is an
incorrect file extension; update the logic in that block (the format assignment
used when building the return object for response.requestId / this.generateId /
this.model) to normalize common MIME subtypes to proper extensions (e.g., "mpeg"
-> "mp3", "mp4" -> "m4a") and prefer a safeUrlExtension when one exists; ensure
contentType (contentTypeMime || `audio/${format}`) remains consistent after
normalization so consumers get a standard file extension.

In `@packages/typescript/ai-fal/src/utils/client.ts`:
- Around line 89-92: The switch handling audio extensions currently groups 'aac'
with 'm4a'/'mp4' and returns 'audio/mp4'; update the mapping so case 'm4a' and
case 'mp4' continue to return 'audio/mp4' but case 'aac' returns 'audio/aac'
instead (adjust the switch in the function that contains the case
'm4a'/'mp4'/'aac' to separate the 'aac' branch and its return value).
- Around line 59-64: The extractUrlExtension function fails for URLs with
fragments or trailing slashes; update extractUrlExtension to first strip URL
fragments (remove anything after '#') and trim trailing slashes before removing
query params and extracting the final segment so extension detection works for
inputs like 'https://x.com/a.mp3#t=5' and 'https://x.com/a.mp3/'; locate the
extractUrlExtension function and modify the preprocessing of url (remove
fragment, then remove query and trailing slashes) before splitting on '.' and
applying the existing /^[a-z0-9]{2,5}$/i test.

In `@packages/typescript/ai-fal/tests/utils.test.ts`:
- Around line 1-16: Move the static imports above the vitest mock call to
satisfy ESLint import/first and import/order: place the imports for
configureFalClient, deriveAudioContentType, extractUrlExtension, generateId
(from ../src/utils), mapSizeToFalFormat (from
../src/image/image-provider-options) and the fal import from '@fal-ai/client'
before the vi.mock('@fal-ai/client', ...) invocation so the vi.mock call remains
but all import statements appear first.

In `@packages/typescript/ai-gemini/src/adapters/audio.ts`:
- Line 143: The returned config currently spreads apiKey first then ...config so
config.apiKey can override the explicit apiKey argument; change the object
spread order when constructing the GeminiAudioAdapter (the return that calls new
GeminiAudioAdapter({ apiKey, ...config }, model)) so the explicit apiKey takes
precedence by placing ...config before apiKey (i.e., merge config then set
apiKey last) when calling the GeminiAudioAdapter constructor.
- Around line 79-83: The adapter is currently ignoring
AudioGenerationOptions.duration when building the request; update the logic
around const { model, prompt, modelOptions } = options and the call to
this.client.models.generateContent to either apply or reject duration: validate
options.duration up-front, and if model indicates a fixed-length provider (e.g.,
Lyria 3 Clip) throw a clear error when duration !== 30, otherwise for models
that accept variable duration (e.g., Lyria 3 Pro) inject the validated duration
into the generation request via modelOptions or by appending it to prompt before
calling this.client.models.generateContent so the provider receives the intended
duration. Ensure any rejection uses a descriptive error mentioning
AudioGenerationOptions.duration and the model name.
- Around line 32-35: Remove the unsupported negativePrompt field from the audio
generation types and code: delete the negativePrompt property from the
interface/props in packages/typescript/ai-gemini/src/adapters/audio.ts and stop
passing negativePrompt into the `@google/genai` GenerateContentConfig used by
functions like the music generation adapter (look for any call that builds a
GenerateContentConfig or calls client.generateContent/sendGenerateRequest),
ensuring the payload only uses fields supported by GenerateContentConfig; also
remove any TypeScript references/usages of negativePrompt to keep typings
consistent.
- Around line 20-25: Make GeminiAudioProviderOptions model-generic and constrain
responseMimeType with the same per-model resolution pattern used elsewhere:
change GeminiAudioProviderOptions to GeminiAudioProviderOptions<TModel> and type
responseMimeType using ResolveProviderOptions<TModel> (or a TModel conditional)
so only Pro-capable models (e.g., those where ResolveProviderOptions<TModel>
indicates WAV is supported or TModel extends 'lyria-3-pro') allow 'audio/wav',
otherwise responseMimeType is never/undefined; apply the same
generic/conditional change to the equivalent audio option types around the 64–78
region so WAV is compile-time allowed only for Pro models (reference symbols:
GeminiAudioProviderOptions, responseMimeType, ResolveProviderOptions<TModel>).

In `@packages/typescript/ai-gemini/src/adapters/tts.ts`:
- Around line 243-260: The current parsePcmMimeType incorrectly treats
containerized WAV as raw PCM because normalized.includes('codec=pcm') will match
values like "audio/wav;codec=pcm"; update parsePcmMimeType to only accept
codec=pcm when the subtype is not a WAV/container. Concretely: derive the media
subtype from normalized (after "audio/"), keep the existing bitDepthMatch and
explicit checks for 'audio/pcm' and 'audio/x-pcm', but replace the generic
normalized.includes('codec=pcm') with a guarded check that checks for
'codec=pcm' AND ensures the subtype is not 'wav', 'x-wav', 'wave' or any subtype
that contains 'wav' (for example by extracting the subtype via split('/') and
checking it does not include 'wav'). Apply this change inside parsePcmMimeType
using the existing variables normalized, bitDepthMatch and isPcm.
- Around line 64-66: The multiSpeakerVoiceConfig.speakerVoiceConfigs array
currently allows empty or 3+ entries even though only 1–2 speakers are
supported; add a validation guard in the adapter before any call that builds or
sends a Gemini TTS request (e.g., in the function that reads
multiSpeakerVoiceConfig and where the request payload is constructed — reference
symbols: multiSpeakerVoiceConfig and speakerVoiceConfigs and the TTS request
construction points around the existing handling at the earlier block and the
later block around lines 144–147) that checks speakerVoiceConfigs.length is 1 or
2 and returns/throws a clear error if not; apply the same guard at both
locations so invalid arrays are rejected client-side before calling Gemini.
- Around line 151-162: The code currently assigns speechConfig.voiceConfig =
modelOptions?.voiceConfig ?? { prebuiltVoiceConfig: { voiceName } } which allows
an empty modelOptions.voiceConfig (e.g. {}) to suppress the fallback voice;
update the logic in the code that sets speechConfig.voiceConfig (referencing
speechConfig, modelOptions, voiceConfig, prebuiltVoiceConfig, voiceName,
GeminiTTSVoice, GEMINI_TTS_VOICES) to detect when modelOptions.voiceConfig is
missing or does not include a valid prebuiltVoiceConfig.voiceName and in that
case set prebuiltVoiceConfig.voiceName to the computed voiceName (fallback to
'Kore'); ensure existing valid modelOptions.voiceConfig is preserved otherwise.

---

Outside diff comments:
In `@packages/typescript/ai-gemini/src/adapters/image.ts`:
- Around line 208-256: The function transformImagenResponse currently only
throws when entries.length > 0 && images.length === 0 which lets undefined/empty
generatedImages resolve to images: []; change the gate to throw whenever
images.length === 0 (regardless of entries) so callers must handle an empty
result; keep collecting filterReasons and include them in the thrown Error
message (use filterReasons.join('; ') when present) and preserve the existing
console.warn behavior for partial filters; update references in the message to
include the model and entries.length for context (function:
transformImagenResponse, vars: entries, images, filterReasons).

---

Nitpick comments:
In `@packages/typescript/ai-fal/src/adapters/audio.ts`:
- Around line 98-127: transformResponse currently only checks data.audio and
data.audio_url, so responses with data.audios (an array) will fail; update
transformResponse to also handle data.audios when it's an array by selecting the
first item (const first = (data.audios as Array<Record<string, unknown>>)[0])
and extracting first.url and first.content_type into audioUrl/contentType,
falling back to existing audio/audio_url logic; preserve the same return shape
(id: response.requestId || this.generateId(), model: this.model, audio: { url:
audioUrl, contentType: deriveAudioContentType(contentType, audioUrl) }) and keep
throwing the existing error if no URL is found.

In `@packages/typescript/ai-gemini/src/adapters/audio.ts`:
- Around line 96-109: The code extracts audioPart from response.candidates using
loose any typing; replace this with a Zod validation: define a small schema
(e.g., InlineAudioPartSchema) that validates the shape { inlineData: { mimeType:
string, data: string } } and apply it to
response.candidates?.[0]?.content?.parts (or to the selected part) before using
audioPart.inlineData.mimeType/data; if validation fails, throw a clear error
like "Invalid Gemini Lyria audio part" so malformed provider/SDK responses fail
predictably and follow the repo's Zod validation conventions.

In `@packages/typescript/ai-gemini/tests/model-meta.test.ts`:
- Around line 592-600: Add an explicit assertion for the new 3.1 Flash TTS entry
in the "Gemini TTS model registry" test: inside the it block that checks
GEMINI_TTS_MODELS, add an expect toContain for the GEMINI_3_1_FLASH_TTS constant
(or the literal 'gemini-3.1-flash-tts-preview') alongside the existing checks
for GEMINI_2_5_PRO_TTS and GEMINI_2_5_FLASH_TTS to ensure the 3.1 Flash TTS name
is present and guarded against future regressions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0f7664c1-c69f-413d-8fe2-ce9786e539f0

📥 Commits

Reviewing files that changed from the base of the PR and between 8af703f and 78c6694.

📒 Files selected for processing (22)
  • packages/typescript/ai-fal/src/adapters/audio.ts
  • packages/typescript/ai-fal/src/adapters/image.ts
  • packages/typescript/ai-fal/src/adapters/speech.ts
  • packages/typescript/ai-fal/src/adapters/transcription.ts
  • packages/typescript/ai-fal/src/image/image-provider-options.ts
  • packages/typescript/ai-fal/src/utils/client.ts
  • packages/typescript/ai-fal/src/utils/index.ts
  • packages/typescript/ai-fal/tests/audio-adapter.test.ts
  • packages/typescript/ai-fal/tests/image-adapter.test.ts
  • packages/typescript/ai-fal/tests/speech-adapter.test.ts
  • packages/typescript/ai-fal/tests/transcription-adapter.test.ts
  • packages/typescript/ai-fal/tests/utils.test.ts
  • packages/typescript/ai-fal/tests/video-adapter.test.ts
  • packages/typescript/ai-gemini/src/adapters/audio.ts
  • packages/typescript/ai-gemini/src/adapters/image.ts
  • packages/typescript/ai-gemini/src/adapters/tts.ts
  • packages/typescript/ai-gemini/src/image/image-provider-options.ts
  • packages/typescript/ai-gemini/src/model-meta.ts
  • packages/typescript/ai-gemini/tests/audio-adapter.test.ts
  • packages/typescript/ai-gemini/tests/image-adapter.test.ts
  • packages/typescript/ai-gemini/tests/model-meta.test.ts
  • packages/typescript/ai-gemini/tests/tts-adapter.test.ts
✅ Files skipped from review due to trivial changes (1)
  • packages/typescript/ai-fal/tests/audio-adapter.test.ts
🚧 Files skipped from review as they are similar to previous changes (7)
  • packages/typescript/ai-fal/src/utils/index.ts
  • packages/typescript/ai-fal/tests/image-adapter.test.ts
  • packages/typescript/ai-gemini/src/model-meta.ts
  • packages/typescript/ai-gemini/tests/audio-adapter.test.ts
  • packages/typescript/ai-gemini/tests/tts-adapter.test.ts
  • packages/typescript/ai-fal/src/adapters/image.ts
  • packages/typescript/ai-fal/tests/transcription-adapter.test.ts

Comment thread packages/typescript/ai-fal/src/adapters/speech.ts
Comment thread packages/typescript/ai-fal/src/utils/client.ts
Comment thread packages/typescript/ai-fal/src/utils/client.ts
Comment thread packages/typescript/ai-fal/tests/utils.test.ts
Comment on lines +20 to +25
export interface GeminiAudioProviderOptions {
/**
* Request WAV output instead of the default MP3. Lyria 3 Pro only;
* the Clip model always returns MP3 and will reject this field.
*/
responseMimeType?: 'audio/wav'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Gemini Lyria 3 Clip output format and Lyria 3 Pro responseMimeType audio/wav support

💡 Result:

Gemini Lyria 3 Clip (lyria-3-clip-preview) outputs audio clips up to 30 seconds in audio/mp3 format via the Gemini API generateContent method using response_modalities=["AUDIO", "TEXT"]. Vertex AI docs specify supported MIME types: audio/mp3 at 44.1 kHz, 192 kbps. Code examples save as .mp3 with MIME type audio/mp3. Lyria 3 Pro (lyria-3-pro-preview) outputs full-length tracks up to ~3 minutes (184 seconds max) similarly in audio/mp3 format. One example mentions possible WAV, but official Vertex AI lists audio/mp3; a REST example shows audio/mpeg. No explicit support for requesting audio/wav via responseMimeType or other params for Lyria 3 models. Older Lyria 2 (lyria-002) uses audio/wav in Vertex AI predict endpoint. Gemini API responseMimeType primarily controls text outputs (e.g., application/json), not audio modality MIME types, which appear fixed per model.

Citations:


🏁 Script executed:

# Find the audio adapter file
fd "audio.ts" packages/typescript/ai-gemini/src/adapters/

# Check the file content around the commented lines
cat -n packages/typescript/ai-gemini/src/adapters/audio.ts | head -100

Repository: TanStack/ai

Length of output: 3768


🏁 Script executed:

# Search for GeminiAudioModel definition
rg "type GeminiAudioModel|interface GeminiAudioModel" packages/typescript/ai-gemini/src/

# Search for how responseMimeType is used
rg "responseMimeType" packages/typescript/ai-gemini/src/

Repository: TanStack/ai

Length of output: 1194


🏁 Script executed:

# Check if there are any tests or documentation about this
fd "test\|spec\|\.md" packages/typescript/ai-gemini/src/ | head -20
rg "responseMimeType" packages/typescript/ai-gemini/ -A 3 -B 3

Repository: TanStack/ai

Length of output: 12555


🏁 Script executed:

# Find and check model-meta.ts
cat packages/typescript/ai-gemini/src/model-meta.ts | head -150

Repository: TanStack/ai

Length of output: 3505


🏁 Script executed:

# Search for GEMINI_AUDIO_MODELS definition
rg "GEMINI_AUDIO_MODELS" packages/typescript/ai-gemini/src/ -A 3

Repository: TanStack/ai

Length of output: 1750


🏁 Script executed:

# Check for patterns of model-specific provider options in the codebase
rg "extends.*=.*\{" packages/typescript/ai-gemini/src/adapters/ -A 2 | head -40

Repository: TanStack/ai

Length of output: 37


🏁 Script executed:

# Find LYRIA_3_PRO and LYRIA_3_CLIP definitions
rg "LYRIA_3_PRO|LYRIA_3_CLIP" packages/typescript/ai-gemini/src/model-meta.ts -A 10 | head -60

Repository: TanStack/ai

Length of output: 770


🏁 Script executed:

# Check other adapters to see if model-specific provider options exist anywhere
rg "TModel extends" packages/typescript/ai-gemini/src/adapters/ -B 2 -A 2

Repository: TanStack/ai

Length of output: 8209


🏁 Script executed:

# Check FAL package for the pattern mentioned in learnings
rg "keyof EndpointTypeMap" packages/typescript/ai-fal/src/ -B 3 -A 3 | head -50

Repository: TanStack/ai

Length of output: 4028


Make responseMimeType model-specific using conditional types.

The current type allows responseMimeType: 'audio/wav' for all audio models, but the JSDoc explicitly states it's Pro-only and Clip will reject it. The text adapter already uses ResolveProviderOptions<TModel> for per-model option resolution; apply the same pattern here to provide compile-time safety:

♻️ Proposed type tightening
-export interface GeminiAudioProviderOptions {
+export type GeminiAudioProviderOptions<
+  TModel extends GeminiAudioModel = GeminiAudioModel,
+> = {
   /**
-   * Request WAV output instead of the default MP3. Lyria 3 Pro only;
-   * the Clip model always returns MP3 and will reject this field.
+   * Seed for deterministic generation.
    */
-  responseMimeType?: 'audio/wav'
-
-  /**
-   * Seed for deterministic generation.
-   */
   seed?: number
-
-  /**
-   * Negative prompt — describe what to exclude from the output.
-   */
   negativePrompt?: string
-}
+} & (TModel extends 'lyria-3-pro-preview'
+  ? {
+      /**
+       * Request WAV output instead of the default MP3. Lyria 3 Pro only.
+       */
+      responseMimeType?: 'audio/wav'
+    }
+  : {
+      responseMimeType?: never
+    })
@@
 export class GeminiAudioAdapter<
   TModel extends GeminiAudioModel,
-> extends BaseAudioAdapter<TModel, GeminiAudioProviderOptions> {
+> extends BaseAudioAdapter<TModel, GeminiAudioProviderOptions<TModel>> {
@@
   async generateAudio(
-    options: AudioGenerationOptions<GeminiAudioProviderOptions>,
+    options: AudioGenerationOptions<GeminiAudioProviderOptions<TModel>>,
   ): Promise<AudioGenerationResult> {

Also applies to: 64–78

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/adapters/audio.ts` around lines 20 - 25,
Make GeminiAudioProviderOptions model-generic and constrain responseMimeType
with the same per-model resolution pattern used elsewhere: change
GeminiAudioProviderOptions to GeminiAudioProviderOptions<TModel> and type
responseMimeType using ResolveProviderOptions<TModel> (or a TModel conditional)
so only Pro-capable models (e.g., those where ResolveProviderOptions<TModel>
indicates WAV is supported or TModel extends 'lyria-3-pro') allow 'audio/wav',
otherwise responseMimeType is never/undefined; apply the same
generic/conditional change to the equivalent audio option types around the 64–78
region so WAV is compile-time allowed only for Pro models (reference symbols:
GeminiAudioProviderOptions, responseMimeType, ResolveProviderOptions<TModel>).

Comment on lines +79 to +83
const { model, prompt, modelOptions } = options

const response = await this.client.models.generateContent({
model,
contents: [{ role: 'user', parts: [{ text: prompt }] }],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Gemini Lyria 3 Pro duration control via prompt and Lyria 3 Clip fixed 30 seconds

💡 Result:

Gemini Lyria 3 Pro supports duration control via prompts, allowing generation of full-length songs up to a couple of minutes (e.g., 3 minutes) by specifying length in the prompt like "2 minutes 30 seconds" or using timestamps and song structure tags (e.g., [Verse], [Chorus]). Lyria 3 Clip is fixed at 30 seconds and does not support variable duration.

Citations:


🏁 Script executed:

find . -name "audio.ts" -path "*/ai-gemini/*" | head -5

Repository: TanStack/ai

Length of output: 108


🏁 Script executed:

fd "audio\.ts" packages/typescript/ai-gemini

Repository: TanStack/ai

Length of output: 106


🏁 Script executed:

rg -l "AudioGenerationOptions" --type ts packages/typescript/

Repository: TanStack/ai

Length of output: 254


🏁 Script executed:

cat -n packages/typescript/ai-gemini/src/adapters/audio.ts | sed -n '70,90p'

Repository: TanStack/ai

Length of output: 958


🏁 Script executed:

rg "type AudioGenerationOptions|interface AudioGenerationOptions" -A 10 packages/typescript/ai/src/types.ts

Repository: TanStack/ai

Length of output: 414


Honor or reject duration instead of silently ignoring it.

AudioGenerationOptions.duration is part of the public audio API, but this adapter drops it without validation or error. For Lyria 3 Clip (fixed at 30 seconds) and Lyria 3 Pro (duration via prompt), callers requesting a specific duration get surprising output. Either validate and apply the duration parameter, or explicitly reject unsupported values.

🐛 Proposed fix: handle provider duration semantics
-    const { model, prompt, modelOptions } = options
+    const { model, prompt, duration, modelOptions } = options
+
+    let contentsPrompt = prompt
+    if (duration != null) {
+      if (model === 'lyria-3-clip-preview' && duration !== 30) {
+        throw new Error(
+          'lyria-3-clip-preview always generates 30-second clips; use lyria-3-pro-preview for controllable duration',
+        )
+      }
+
+      if (model === 'lyria-3-pro-preview') {
+        contentsPrompt = `Create an approximately ${duration}-second track.\n\n${prompt}`
+      }
+    }
 
     const response = await this.client.models.generateContent({
       model,
-      contents: [{ role: 'user', parts: [{ text: prompt }] }],
+      contents: [{ role: 'user', parts: [{ text: contentsPrompt }] }],
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/adapters/audio.ts` around lines 79 - 83,
The adapter is currently ignoring AudioGenerationOptions.duration when building
the request; update the logic around const { model, prompt, modelOptions } =
options and the call to this.client.models.generateContent to either apply or
reject duration: validate options.duration up-front, and if model indicates a
fixed-length provider (e.g., Lyria 3 Clip) throw a clear error when duration !==
30, otherwise for models that accept variable duration (e.g., Lyria 3 Pro)
inject the validated duration into the generation request via modelOptions or by
appending it to prompt before calling this.client.models.generateContent so the
provider receives the intended duration. Ensure any rejection uses a descriptive
error mentioning AudioGenerationOptions.duration and the model name.

Comment thread packages/typescript/ai-gemini/src/adapters/audio.ts Outdated
Comment thread packages/typescript/ai-gemini/src/adapters/tts.ts
Comment thread packages/typescript/ai-gemini/src/adapters/tts.ts
Comment thread packages/typescript/ai-gemini/src/adapters/tts.ts
…enforce

The defensive nullish-coalescing on response.choices and img/img.imageUrl
guards that the fix-loop added are impossible per the SDK type signatures;
eslint's no-unnecessary-condition correctly rejects them. Keep only the
typeof url !== 'string' check, which is a real runtime shape guard
(imageUrl.url is typed as string but provider may send a non-string in
rare degraded responses).
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (6)
examples/ts-react-chat/src/routes/generations.speech.tsx (1)

65-71: ⚠️ Potential issue | 🟠 Major

Wrap generateSpeechStreamFn in fetchServerSentEvents for the Server Fn streaming mode.

This branch still hands the raw SSE Response from generateSpeechStreamFn to fetcher, so onResult: toSpeechOutput is invoked with the Response instead of parsed { audio, contentType, ... } chunks and audio playback will fail in Server Fn mode.

Proposed fix
     return {
-      fetcher: (input: { text: string; voice?: string }) =>
-        generateSpeechStreamFn({
-          data: { ...input, provider: config.id },
-        }),
+      connection: fetchServerSentEvents(
+        (input: { text: string; voice?: string }) =>
+          generateSpeechStreamFn({
+            data: { ...input, provider: config.id },
+          }),
+      ),
       onResult: toSpeechOutput,
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/generations.speech.tsx` around lines 65 -
71, The fetcher currently passes the raw SSE Response from
generateSpeechStreamFn to onResult; wrap the call in fetchServerSentEvents so
the fetcher returns parsed SSE chunks instead of a Response. Specifically,
change the fetcher to call fetchServerSentEvents(generateSpeechStreamFn({ data:
{ ...input, provider: config.id } })) so that onResult: toSpeechOutput receives
parsed {audio, contentType, ...} chunks; ensure fetchServerSentEvents is
imported/available in the file.
examples/ts-react-chat/src/routes/generations.transcription.tsx (1)

41-50: ⚠️ Potential issue | 🟠 Major

Wrap transcribeStreamFn in fetchServerSentEvents for the Server Fn streaming mode.

transcribeStreamFn returns an SSE Response, but this branch still provides it as a plain fetcher. The hook will treat the raw Response as the final payload instead of consuming parsed stream chunks, so the Server Fn mode won't behave like the SSE mode.

Proposed fix
     return {
-      fetcher: (input: { audio: string | Blob; language?: string }) =>
-        transcribeStreamFn({
-          data: {
-            audio: input.audio as string,
-            language: input.language,
-            provider: config.id,
-          },
-        }),
+      connection: fetchServerSentEvents(
+        (input: { audio: string | Blob; language?: string }) =>
+          transcribeStreamFn({
+            data: {
+              audio: input.audio as string,
+              language: input.language,
+              provider: config.id,
+            },
+          }),
+      ),
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/generations.transcription.tsx` around lines
41 - 50, The fetcher currently returns the raw SSE Response from
transcribeStreamFn, so wrap the call in fetchServerSentEvents to convert the SSE
Response into a streaming payload the hook can consume; update the fetcher (the
object returned by the function) to call fetchServerSentEvents(() =>
transcribeStreamFn({ data: { audio: input.audio as string, language:
input.language, provider: config.id } })) instead of returning
transcribeStreamFn directly, ensuring transcribeStreamFn and
fetchServerSentEvents are imported/available.
packages/typescript/ai-event-client/src/index.ts (1)

519-530: ⚠️ Potential issue | 🟠 Major

Tighten AudioRequestCompletedEvent.audio to mirror GeneratedAudio's exclusive source union.

As currently typed, audio accepts {} or { url, b64Json }, which contradicts the PR’s promise that GeneratedAudio requires exactly one of url / b64Json. Mirror the contract in the event payload.

🛡️ Proposed type tightening
+export type AudioEventSource =
+  | { url: string; b64Json?: never; contentType?: string; duration?: number }
+  | { b64Json: string; url?: never; contentType?: string; duration?: number }
+
 /** Emitted when an audio generation request completes. */
 export interface AudioRequestCompletedEvent extends BaseEventContext {
   requestId: string
   provider: string
   model: string
-  audio: {
-    url?: string
-    b64Json?: string
-    contentType?: string
-    duration?: number
-  }
+  audio: AudioEventSource
   duration: number
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-event-client/src/index.ts` around lines 519 - 530, The
AudioRequestCompletedEvent.audio field currently allows both empty objects and
combinations of url/b64Json; change it to exactly match the GeneratedAudio
contract so only one source is allowed. Replace the inline audio type in
AudioRequestCompletedEvent with the existing GeneratedAudio type (or an explicit
union mirroring it: one branch with url and url-only semantics and one branch
with b64Json and b64Json-only semantics), preserving optional
contentType/duration, so consumers cannot provide both or neither; update the
import/usage to reference GeneratedAudio instead of the loose inline object.
packages/typescript/ai-gemini/src/adapters/audio.ts (3)

145-151: ⚠️ Potential issue | 🟡 Minor

Explicit apiKey argument can still be overridden by config.apiKey at runtime.

Despite Omit<GeminiAudioConfig, 'apiKey'> hiding it from the type, a JS caller (or a caller using as any) can pass { apiKey: '...' } and silently override the explicit argument due to the { apiKey, ...config } spread order. Reverse it so the explicit argument wins.

🛡️ Proposed fix
-  return new GeminiAudioAdapter({ apiKey, ...config }, model)
+  return new GeminiAudioAdapter({ ...config, apiKey }, model)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/adapters/audio.ts` around lines 145 - 151,
The createGeminiAudio factory currently spreads apiKey then config so a runtime
config with apiKey can override the explicit parameter; update the object
construction in createGeminiAudio so the explicit apiKey wins (spread config
first, then set apiKey) when creating the GeminiAudioAdapter, keeping the config
type as Omit<GeminiAudioConfig, 'apiKey'> but protecting against JS callers or
any casted inputs; reference function createGeminiAudio, type GeminiAudioConfig
and class GeminiAudioAdapter.

32-36: ⚠️ Potential issue | 🟡 Minor

negativePrompt is still advertised in the public type but silently dropped at runtime.

The previous round removed the forwarding of negativePrompt into GenerateContentConfig (per the FIXME on lines 81-90), which is correct. However the field is still exposed on GeminiAudioProviderOptions, so callers can set it and get full type-safe autocomplete — then see absolutely no effect on output. Either remove it from the type or mark it @deprecated / document that it's ignored until the ai.live.music.connect path is implemented.

🐛 Proposed fix
   seed?: number
-
-  /**
-   * Negative prompt — describe what to exclude from the output.
-   */
-  negativePrompt?: string
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/adapters/audio.ts` around lines 32 - 36,
GeminiAudioProviderOptions currently exposes negativePrompt but it is dropped at
runtime (not forwarded into GenerateContentConfig); either remove negativePrompt
from the GeminiAudioProviderOptions type declaration in adapters/audio.ts or
mark it as deprecated with a clear JSDoc note that it is ignored until the
ai.live.music.connect path is implemented; update any exported types/usages to
match (remove references or add `@deprecated` comment on negativePrompt in
GeminiAudioProviderOptions) and run type checks to ensure no callers rely on the
now-removed field.

79-101: ⚠️ Potential issue | 🟡 Minor

options.duration is still silently ignored.

AudioGenerationOptions.duration is destructured away from options and never applied. For lyria-3-clip-preview (fixed 30s) and lyria-3-pro-preview (prompt-controlled duration, up to ~3 min), callers requesting a specific duration receive surprising output with no warning. Either inject duration into the prompt for Pro, or throw a descriptive error for Clip when duration != null && duration !== 30.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/adapters/audio.ts` around lines 79 - 101,
The code currently ignores AudioGenerationOptions.duration; update the handler
in adapters/audio.ts (around the generateContent call and where options are
destructured) to read duration from options and enforce model-specific behavior:
if model === 'lyria-3-clip-preview' and duration != null && duration !== 30
throw a descriptive error indicating clip preview is fixed to 30s; if model ===
'lyria-3-pro-preview' and duration != null inject the requested duration into
the prompt (e.g., append a short instruction like "Generate audio of X seconds")
before calling this.client.models.generateContent; keep the existing
seed/responseMimeType handling and ensure modelOptions and prompt variable names
are used to find where to change.
🧹 Nitpick comments (11)
packages/typescript/ai-openai/src/adapters/image.ts (2)

101-112: LGTM — flatMap produces mutually-exclusive url/b64Json shape.

Each GeneratedImage now satisfies exactly one branch of GeneratedMediaSource, and items with neither field are dropped instead of producing an empty {} that would violate the new union. Matches the types tightening in packages/typescript/ai/src/types.ts.

One minor worth noting: when an item has neither url nor b64_json, it's silently dropped — consider a console.warn (or surfacing via the devtools image:request:completed event) so callers aren't left wondering why images.length < response.data.length.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-openai/src/adapters/image.ts` around lines 101 - 112,
The current mapping into images drops items with neither url nor b64_json
silently; update the flatMap block that constructs images (the const images:
Array<GeneratedImage> = (response.data ?? []).flatMap(...) in adapters/image.ts)
to log a warning when an item has neither field so callers can diagnose missing
media (e.g., console.warn with context or emit the devtools event
'image:request:completed' with a note), while preserving the existing branches
that return GeneratedImage when item.b64_json or item.url are present.

55-58: Consider passing config to super(...) for consistency.

Unlike OpenAITTSAdapter and OpenAITranscriptionAdapter — which now call super(model, config) — this adapter still discards config with super(model, {}). The parent's this.config ends up as {}. Functionally fine today because this adapter owns its own client and doesn't read this.config, but it drifts from the sibling adapters and hides config from any future base-class hook that inspects this.config.

♻️ Suggested alignment
   constructor(config: OpenAIImageConfig, model: TModel) {
-    super(model, {})
+    super(model, config)
     this.client = createOpenAIClient(config)
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-openai/src/adapters/image.ts` around lines 55 - 58,
The constructor in OpenAIImageAdapter currently calls super(model, {}) and
discards the provided OpenAIImageConfig; update the constructor to pass the
incoming config to the base class (i.e., call super(model, config)) so
this.config is initialized consistently with OpenAITTSAdapter and
OpenAITranscriptionAdapter, while leaving createOpenAIClient(config) and
this.client initialization unchanged.
packages/typescript/ai-gemini/src/model-meta.ts (1)

1016-1019: Consider exporting a GeminiAudioModels type for API consistency.

GEMINI_MODELS/GEMINI_IMAGE_MODELS each have a companion (typeof X)[number] alias (GeminiModels, GeminiImageModels). GEMINI_AUDIO_MODELS doesn't, which makes it awkward for downstream code (e.g., the new geminiAudio adapter) to narrow to known Lyria models.

♻️ Proposed addition
 export const GEMINI_AUDIO_MODELS = [
   LYRIA_3_PRO.name,
   LYRIA_3_CLIP.name,
 ] as const
+
+export type GeminiAudioModels = (typeof GEMINI_AUDIO_MODELS)[number]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/model-meta.ts` around lines 1016 - 1019,
Add a companion exported type alias for GEMINI_AUDIO_MODELS to match the API
pattern used by GEMINI_MODELS/GEMINI_IMAGE_MODELS; specifically, export a type
named GeminiAudioModels that is (typeof GEMINI_AUDIO_MODELS)[number] so
downstream code (e.g., geminiAudio adapter) can narrow values to the known Lyria
audio model names; update exports if necessary to ensure GeminiAudioModels is
exported from the module.
packages/typescript/ai-gemini/tests/model-meta.test.ts (1)

592-600: Assert the newly added Flash 3.1 TTS model in the registry test.

This PR prepends gemini-3.1-flash-tts-preview to GEMINI_TTS_MODELS, but the registry test doesn't cover it. Adding an explicit check prevents it from being accidentally dropped from the list.

💚 Proposed test addition
     expect(GEMINI_TTS_MODELS).toContain('gemini-2.5-pro-preview-tts')
     expect(GEMINI_TTS_MODELS).toContain('gemini-2.5-flash-preview-tts')
+    expect(GEMINI_TTS_MODELS).toContain('gemini-3.1-flash-tts-preview')
     expect(new Set(GEMINI_TTS_MODELS).size).toBe(GEMINI_TTS_MODELS.length)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/tests/model-meta.test.ts` around lines 592 -
600, The test in the "Gemini TTS model registry" suite is missing an assertion
for the newly added Flash 3.1 TTS model; update the it block that checks
GEMINI_TTS_MODELS to also expect the array toContain
'gemini-3.1-flash-tts-preview' (alongside the existing checks for
'gemini-2.5-pro-preview-tts' and 'gemini-2.5-flash-preview-tts') so the new
model cannot be accidentally removed from GEMINI_TTS_MODELS.
packages/typescript/ai-event-client/src/index.ts (1)

560-564: AudioUsageEvent.usage is typed as ImageUsage — introduce an audio-specific type.

ImageUsage is semantically image-specific (input/output tokens for image models). Using it for audio usage is confusing for consumers and will be wrong if audio usage grows modality-specific fields (e.g., audio seconds, character count). Define a dedicated AudioUsage (even if initially identical) so the public API is honest about the modality.

♻️ Proposed type
+export interface AudioUsage {
+  inputTokens?: number
+  outputTokens?: number
+  totalTokens?: number
+  /** Duration of generated audio in seconds, when available */
+  audioSeconds?: number
+  /** Input character count for TTS/music prompts, when available */
+  characters?: number
+}
+
 /** Emitted when audio usage metrics are available. */
 export interface AudioUsageEvent extends BaseEventContext {
   requestId: string
   model: string
-  usage: ImageUsage
+  usage: AudioUsage
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-event-client/src/index.ts` around lines 560 - 564, The
AudioUsageEvent currently reuses ImageUsage which is semantically incorrect; add
a new exported type/interface named AudioUsage (can initially mirror ImageUsage
fields) and update the AudioUsageEvent declaration to use AudioUsage instead of
ImageUsage; ensure the new AudioUsage type is exported alongside ImageUsage so
consumers can import it and update any references to ImageUsage where they
specifically pertain to audio usage (symbols: AudioUsageEvent, ImageUsage,
AudioUsage).
packages/typescript/ai/src/activities/generateAudio/adapter.ts (2)

38-40: ~types as a required interface property forces implementations to declare it.

Any non-BaseAudioAdapter implementation of AudioAdapter must add declare '~types': { providerOptions: TProviderOptions } (or assign it), otherwise TS structural typing will reject it. Marking it optional ('~types'?:) — or documenting the expectation on the interface — would make the contract friendlier for third-party adapters without changing inference at the call sites that use TAdapter['~types']['providerOptions'].

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/src/activities/generateAudio/adapter.ts` around lines
38 - 40, The interface currently requires the literal property '~types' which
forces every AudioAdapter implementation to explicitly declare it; update the
AudioAdapter definition to make '~types' optional (change '~types' to '~types'?)
so third‑party adapters don't need to declare it while preserving inference at
call sites that read TAdapter['~types']['providerOptions']; ensure
BaseAudioAdapter still provides the concrete '~types' shape and keep
TProviderOptions type unchanged so existing consumers continue to infer
providerOptions correctly.

86-88: generateId yields surprisingly little entropy.

Math.random().toString(36) produces "0.xxxxxxxxxxxx" (~12 base-36 chars of payload starting at index 2); .substring(7) drops another 5 chars, leaving roughly 4–8 base-36 characters of randomness. With a per-ms timestamp prefix, two adapter calls firing in the same millisecond (e.g. parallel generateAudio in tests or streaming) have a non-trivial collision probability for correlation IDs emitted to devtools events.

Since IDs are for correlation, not security, crypto.randomUUID() (with a Node/browser-safe fallback) or .substring(2, 12) would be much safer and just as cheap.

♻️ Proposed fix
   protected generateId(): string {
-    return `${this.name}-${Date.now()}-${Math.random().toString(36).substring(7)}`
+    return `${this.name}-${Date.now()}-${Math.random().toString(36).slice(2, 12)}`
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/src/activities/generateAudio/adapter.ts` around lines
86 - 88, The current generateId method creates low-entropy IDs and risks
collisions; update the generateId implementation in the class (method
generateId, referencing this.name) to use a higher-entropy source such as
crypto.randomUUID() when available, with a safe fallback for older environments
(e.g., use Math.random().toString(36).substring(2,12) or another longer
substring) and keep the same prefix format `${this.name}-<id>` so tests/devtools
correlation uses a much stronger unique value.
examples/ts-react-chat/src/routes/generations.audio.tsx (2)

2-12: Apply ESLint fixes for import ordering and type-only imports.

Static analysis flags four violations here: @tanstack/ai-react and @tanstack/ai type imports should come after the ../lib/audio-providers import, and the inline type specifiers on lines 10-11 should be promoted to a top-level type-only import.

♻️ Proposed fix
 import { useEffect, useMemo, useRef, useState } from 'react'
 import { createFileRoute } from '@tanstack/react-router'
 import { useGenerateAudio } from '@tanstack/ai-react'
-import type { UseGenerateAudioReturn } from '@tanstack/ai-react'
 import { fetchServerSentEvents } from '@tanstack/ai-client'
-import type { AudioGenerationResult } from '@tanstack/ai'
 import { generateAudioFn } from '../lib/server-fns'
-import {
-  AUDIO_PROVIDERS,
-  type AudioProviderConfig,
-  type AudioProviderId,
-} from '../lib/audio-providers'
+import { AUDIO_PROVIDERS } from '../lib/audio-providers'
+import type { UseGenerateAudioReturn } from '@tanstack/ai-react'
+import type { AudioGenerationResult } from '@tanstack/ai'
+import type {
+  AudioProviderConfig,
+  AudioProviderId,
+} from '../lib/audio-providers'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/generations.audio.tsx` around lines 2 - 12,
Reorder and convert the imports to satisfy ESLint: move the local imports
(generateAudioFn, AUDIO_PROVIDERS, AudioProviderConfig, AudioProviderId) before
type-only imports from external packages, and change inline `type` specifiers to
a top-level type-only import. Specifically, ensure imports for createFileRoute,
useGenerateAudio, fetchServerSentEvents remain, then import AUDIO_PROVIDERS,
generateAudioFn, AudioProviderConfig, AudioProviderId, and finally add a single
`import type { UseGenerateAudioReturn, AudioGenerationResult } from
'@tanstack/ai-react'` (or split by package if necessary) so that
UseGenerateAudioReturn and AudioGenerationResult are imported as top-level
type-only imports rather than inline `type` specifiers.

286-286: Guard the non-null assertion on AUDIO_PROVIDERS.find.

provider state is only ever set from the provider tabs, so this is currently safe, but the non-null assertion will turn any future typo or stale persisted value into a runtime crash on first render. A simple fallback to AUDIO_PROVIDERS[0] keeps the page alive.

🛡️ Proposed fix
-  const config = AUDIO_PROVIDERS.find((p) => p.id === provider)!
+  const config =
+    AUDIO_PROVIDERS.find((p) => p.id === provider) ?? AUDIO_PROVIDERS[0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/generations.audio.tsx` at line 286, The
code uses a non-null assertion when locating the audio provider: change the line
that sets config (const config = AUDIO_PROVIDERS.find((p) => p.id ===
provider)!) to safely handle a missing match by falling back to a default (e.g.,
AUDIO_PROVIDERS[0]) and/or null-checking provider; ensure you use
AUDIO_PROVIDERS.find(...) || AUDIO_PROVIDERS[0] (or equivalent) so config is
never undefined and the page doesn't crash on first render if the provider value
is stale or mistyped.
packages/typescript/ai-fal/src/adapters/audio.ts (2)

65-74: BaseAudioAdapter receives {} and drops the fal config.

super(model, {}) stores an empty AudioAdapterConfig on the base class, throwing away config.apiKey / config.proxyUrl. configureFalClient uses them at construction time, so there's no functional bug today, but if any shared base-class logic starts reading this.config (retries, headers, timeouts) this adapter will silently behave as if unconfigured. Passing the (compatible) subset through — or at least config ?? {} — would be more future-proof.

♻️ Proposed tweak
-  constructor(model: TModel, config?: FalClientConfig) {
-    super(model, {})
-    configureFalClient(config)
-  }
+  constructor(model: TModel, config?: FalClientConfig) {
+    super(model, config ?? {})
+    configureFalClient(config)
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-fal/src/adapters/audio.ts` around lines 65 - 74,
FalAudioAdapter's constructor calls super(model, {}) which discards the incoming
FalClientConfig and leaves the base AudioAdapterConfig empty; update the
constructor to pass the provided config (or config ?? {}) into the
BaseAudioAdapter so shared base-class logic can access apiKey/proxyUrl/timeouts
stored on this.config. Specifically, modify the FalAudioAdapter constructor (the
super(model, {}) call) to forward the compatible subset of FalClientConfig into
BaseAudioAdapter, while keeping the configureFalClient(config) call.

119-126: Consider propagating duration (and the inverse ms→s translation) into the result.

The adapter translates options.duration into model-specific fields (e.g., music_length_ms) on the way in, but the returned AudioGenerationResult.audio.duration is always unset. Callers that displayed the originally requested duration will lose that information after a round-trip, and for ElevenLabs the response may carry a music_length_ms value that could be surfaced as duration (seconds). Low-priority, but cheap to pass through when available.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-fal/src/adapters/audio.ts` around lines 119 - 126, The
adapter currently returns AudioGenerationResult without a duration; update the
return object in the function that builds the response (the block that returns
id: response.requestId || this.generateId(), model: this.model, audio: {...}) to
include audio.duration (in seconds) by propagating the original options.duration
if present, and if not, extract provider-specific duration fields from the
response (e.g., music_length_ms) converting milliseconds→seconds (divide by
1000). Use deriveAudioContentType as-is and ensure the duration is a number
(seconds) or undefined when not available.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/ts-react-chat/src/routes/api.generate.speech.ts`:
- Around line 44-51: The code uses parsed.error.flatten(), which is deprecated;
replace that call with the top-level z.flattenError(parsed.error) and pass its
result into jsonError instead. Update the validation block around
SPEECH_BODY_SCHEMA.safeParse (the parsed variable) so that when parsed.success
is false you call z.flattenError(parsed.error) and include that flattened result
in the details field passed to jsonError (also update the same pattern in the
handlers for audio and transcribe).

In `@examples/ts-react-chat/src/routes/generations.audio.tsx`:
- Around line 23-50: The toAudioOutput(AudioGenerationResult) function currently
throws when audio has neither url nor b64Json; change it to not throw inside the
onResult callback used by useGenerateAudio — instead return null (per the hook
contract) to keep the previous result unchanged and invoke the provided error
handler (e.g., call onError / propagate an error via the hook's error pathway)
with a clear message referencing the problematic AudioGenerationResult; update
callers that rely on toAudioOutput (the onResult consumer in useGenerateAudio)
to expect null as a valid return and surface the failure through the hook's
error state rather than letting a synchronous throw escape.

In `@packages/typescript/ai/src/activities/generateTranscription/adapter.ts`:
- Line 77: The constructor argument order for
BaseTranscriptionAdapter/BaseImageAdapter/BaseTTSAdapter changed from (config,
model) to (model, config), so update every subclass that still calls super with
the old order to pass the model first and config second (e.g., change super({},
model) → super(model, {}) and super(config, model) → super(model, config)) in
the listed adapters (ai-openai/adapters/summarize.ts, video.ts, text.ts;
ai-openrouter/adapters/summarize.ts, text.ts; ai-fal/adapters/video.ts;
ai-ollama/adapters/text.ts; ai-groq/adapters/text.ts;
ai-grok/adapters/summarize.ts, text.ts; ai-gemini/adapters/text.ts;
ai-anthropic/adapters/summarize.ts, text.ts) and any other in-tree/out-of-tree
subclasses; also add a changeset entry for `@tanstack/ai` with a minor version
bump documenting this breaking constructor change and include a migration note
instructing external adapter authors to swap the constructor argument order to
(model, config).

---

Duplicate comments:
In `@examples/ts-react-chat/src/routes/generations.speech.tsx`:
- Around line 65-71: The fetcher currently passes the raw SSE Response from
generateSpeechStreamFn to onResult; wrap the call in fetchServerSentEvents so
the fetcher returns parsed SSE chunks instead of a Response. Specifically,
change the fetcher to call fetchServerSentEvents(generateSpeechStreamFn({ data:
{ ...input, provider: config.id } })) so that onResult: toSpeechOutput receives
parsed {audio, contentType, ...} chunks; ensure fetchServerSentEvents is
imported/available in the file.

In `@examples/ts-react-chat/src/routes/generations.transcription.tsx`:
- Around line 41-50: The fetcher currently returns the raw SSE Response from
transcribeStreamFn, so wrap the call in fetchServerSentEvents to convert the SSE
Response into a streaming payload the hook can consume; update the fetcher (the
object returned by the function) to call fetchServerSentEvents(() =>
transcribeStreamFn({ data: { audio: input.audio as string, language:
input.language, provider: config.id } })) instead of returning
transcribeStreamFn directly, ensuring transcribeStreamFn and
fetchServerSentEvents are imported/available.

In `@packages/typescript/ai-event-client/src/index.ts`:
- Around line 519-530: The AudioRequestCompletedEvent.audio field currently
allows both empty objects and combinations of url/b64Json; change it to exactly
match the GeneratedAudio contract so only one source is allowed. Replace the
inline audio type in AudioRequestCompletedEvent with the existing GeneratedAudio
type (or an explicit union mirroring it: one branch with url and url-only
semantics and one branch with b64Json and b64Json-only semantics), preserving
optional contentType/duration, so consumers cannot provide both or neither;
update the import/usage to reference GeneratedAudio instead of the loose inline
object.

In `@packages/typescript/ai-gemini/src/adapters/audio.ts`:
- Around line 145-151: The createGeminiAudio factory currently spreads apiKey
then config so a runtime config with apiKey can override the explicit parameter;
update the object construction in createGeminiAudio so the explicit apiKey wins
(spread config first, then set apiKey) when creating the GeminiAudioAdapter,
keeping the config type as Omit<GeminiAudioConfig, 'apiKey'> but protecting
against JS callers or any casted inputs; reference function createGeminiAudio,
type GeminiAudioConfig and class GeminiAudioAdapter.
- Around line 32-36: GeminiAudioProviderOptions currently exposes negativePrompt
but it is dropped at runtime (not forwarded into GenerateContentConfig); either
remove negativePrompt from the GeminiAudioProviderOptions type declaration in
adapters/audio.ts or mark it as deprecated with a clear JSDoc note that it is
ignored until the ai.live.music.connect path is implemented; update any exported
types/usages to match (remove references or add `@deprecated` comment on
negativePrompt in GeminiAudioProviderOptions) and run type checks to ensure no
callers rely on the now-removed field.
- Around line 79-101: The code currently ignores
AudioGenerationOptions.duration; update the handler in adapters/audio.ts (around
the generateContent call and where options are destructured) to read duration
from options and enforce model-specific behavior: if model ===
'lyria-3-clip-preview' and duration != null && duration !== 30 throw a
descriptive error indicating clip preview is fixed to 30s; if model ===
'lyria-3-pro-preview' and duration != null inject the requested duration into
the prompt (e.g., append a short instruction like "Generate audio of X seconds")
before calling this.client.models.generateContent; keep the existing
seed/responseMimeType handling and ensure modelOptions and prompt variable names
are used to find where to change.

---

Nitpick comments:
In `@examples/ts-react-chat/src/routes/generations.audio.tsx`:
- Around line 2-12: Reorder and convert the imports to satisfy ESLint: move the
local imports (generateAudioFn, AUDIO_PROVIDERS, AudioProviderConfig,
AudioProviderId) before type-only imports from external packages, and change
inline `type` specifiers to a top-level type-only import. Specifically, ensure
imports for createFileRoute, useGenerateAudio, fetchServerSentEvents remain,
then import AUDIO_PROVIDERS, generateAudioFn, AudioProviderConfig,
AudioProviderId, and finally add a single `import type { UseGenerateAudioReturn,
AudioGenerationResult } from '@tanstack/ai-react'` (or split by package if
necessary) so that UseGenerateAudioReturn and AudioGenerationResult are imported
as top-level type-only imports rather than inline `type` specifiers.
- Line 286: The code uses a non-null assertion when locating the audio provider:
change the line that sets config (const config = AUDIO_PROVIDERS.find((p) =>
p.id === provider)!) to safely handle a missing match by falling back to a
default (e.g., AUDIO_PROVIDERS[0]) and/or null-checking provider; ensure you use
AUDIO_PROVIDERS.find(...) || AUDIO_PROVIDERS[0] (or equivalent) so config is
never undefined and the page doesn't crash on first render if the provider value
is stale or mistyped.

In `@packages/typescript/ai-event-client/src/index.ts`:
- Around line 560-564: The AudioUsageEvent currently reuses ImageUsage which is
semantically incorrect; add a new exported type/interface named AudioUsage (can
initially mirror ImageUsage fields) and update the AudioUsageEvent declaration
to use AudioUsage instead of ImageUsage; ensure the new AudioUsage type is
exported alongside ImageUsage so consumers can import it and update any
references to ImageUsage where they specifically pertain to audio usage
(symbols: AudioUsageEvent, ImageUsage, AudioUsage).

In `@packages/typescript/ai-fal/src/adapters/audio.ts`:
- Around line 65-74: FalAudioAdapter's constructor calls super(model, {}) which
discards the incoming FalClientConfig and leaves the base AudioAdapterConfig
empty; update the constructor to pass the provided config (or config ?? {}) into
the BaseAudioAdapter so shared base-class logic can access
apiKey/proxyUrl/timeouts stored on this.config. Specifically, modify the
FalAudioAdapter constructor (the super(model, {}) call) to forward the
compatible subset of FalClientConfig into BaseAudioAdapter, while keeping the
configureFalClient(config) call.
- Around line 119-126: The adapter currently returns AudioGenerationResult
without a duration; update the return object in the function that builds the
response (the block that returns id: response.requestId || this.generateId(),
model: this.model, audio: {...}) to include audio.duration (in seconds) by
propagating the original options.duration if present, and if not, extract
provider-specific duration fields from the response (e.g., music_length_ms)
converting milliseconds→seconds (divide by 1000). Use deriveAudioContentType
as-is and ensure the duration is a number (seconds) or undefined when not
available.

In `@packages/typescript/ai-gemini/src/model-meta.ts`:
- Around line 1016-1019: Add a companion exported type alias for
GEMINI_AUDIO_MODELS to match the API pattern used by
GEMINI_MODELS/GEMINI_IMAGE_MODELS; specifically, export a type named
GeminiAudioModels that is (typeof GEMINI_AUDIO_MODELS)[number] so downstream
code (e.g., geminiAudio adapter) can narrow values to the known Lyria audio
model names; update exports if necessary to ensure GeminiAudioModels is exported
from the module.

In `@packages/typescript/ai-gemini/tests/model-meta.test.ts`:
- Around line 592-600: The test in the "Gemini TTS model registry" suite is
missing an assertion for the newly added Flash 3.1 TTS model; update the it
block that checks GEMINI_TTS_MODELS to also expect the array toContain
'gemini-3.1-flash-tts-preview' (alongside the existing checks for
'gemini-2.5-pro-preview-tts' and 'gemini-2.5-flash-preview-tts') so the new
model cannot be accidentally removed from GEMINI_TTS_MODELS.

In `@packages/typescript/ai-openai/src/adapters/image.ts`:
- Around line 101-112: The current mapping into images drops items with neither
url nor b64_json silently; update the flatMap block that constructs images (the
const images: Array<GeneratedImage> = (response.data ?? []).flatMap(...) in
adapters/image.ts) to log a warning when an item has neither field so callers
can diagnose missing media (e.g., console.warn with context or emit the devtools
event 'image:request:completed' with a note), while preserving the existing
branches that return GeneratedImage when item.b64_json or item.url are present.
- Around line 55-58: The constructor in OpenAIImageAdapter currently calls
super(model, {}) and discards the provided OpenAIImageConfig; update the
constructor to pass the incoming config to the base class (i.e., call
super(model, config)) so this.config is initialized consistently with
OpenAITTSAdapter and OpenAITranscriptionAdapter, while leaving
createOpenAIClient(config) and this.client initialization unchanged.

In `@packages/typescript/ai/src/activities/generateAudio/adapter.ts`:
- Around line 38-40: The interface currently requires the literal property
'~types' which forces every AudioAdapter implementation to explicitly declare
it; update the AudioAdapter definition to make '~types' optional (change
'~types' to '~types'?) so third‑party adapters don't need to declare it while
preserving inference at call sites that read
TAdapter['~types']['providerOptions']; ensure BaseAudioAdapter still provides
the concrete '~types' shape and keep TProviderOptions type unchanged so existing
consumers continue to infer providerOptions correctly.
- Around line 86-88: The current generateId method creates low-entropy IDs and
risks collisions; update the generateId implementation in the class (method
generateId, referencing this.name) to use a higher-entropy source such as
crypto.randomUUID() when available, with a safe fallback for older environments
(e.g., use Math.random().toString(36).substring(2,12) or another longer
substring) and keep the same prefix format `${this.name}-<id>` so tests/devtools
correlation uses a much stronger unique value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b14bfa20-1214-4b31-bc2f-660a60babc8c

📥 Commits

Reviewing files that changed from the base of the PR and between 78c6694 and 8957b41.

📒 Files selected for processing (34)
  • examples/ts-react-chat/src/routes/api.generate.audio.ts
  • examples/ts-react-chat/src/routes/api.generate.speech.ts
  • examples/ts-react-chat/src/routes/api.transcribe.ts
  • examples/ts-react-chat/src/routes/generations.audio.tsx
  • examples/ts-react-chat/src/routes/generations.speech.tsx
  • examples/ts-react-chat/src/routes/generations.transcription.tsx
  • examples/ts-react-chat/src/routes/index.tsx
  • packages/typescript/ai-client/src/generation-types.ts
  • packages/typescript/ai-event-client/src/index.ts
  • packages/typescript/ai-fal/src/adapters/audio.ts
  • packages/typescript/ai-fal/src/adapters/image.ts
  • packages/typescript/ai-fal/src/adapters/speech.ts
  • packages/typescript/ai-fal/src/adapters/transcription.ts
  • packages/typescript/ai-fal/tests/image-adapter.test.ts
  • packages/typescript/ai-gemini/src/adapters/audio.ts
  • packages/typescript/ai-gemini/src/adapters/image.ts
  • packages/typescript/ai-gemini/src/adapters/tts.ts
  • packages/typescript/ai-gemini/src/model-meta.ts
  • packages/typescript/ai-gemini/tests/audio-adapter.test.ts
  • packages/typescript/ai-gemini/tests/model-meta.test.ts
  • packages/typescript/ai-grok/src/adapters/image.ts
  • packages/typescript/ai-openai/src/adapters/image.ts
  • packages/typescript/ai-openai/src/adapters/transcription.ts
  • packages/typescript/ai-openai/src/adapters/tts.ts
  • packages/typescript/ai-openrouter/src/adapters/image.ts
  • packages/typescript/ai-openrouter/tests/image-adapter.test.ts
  • packages/typescript/ai/src/activities/generateAudio/adapter.ts
  • packages/typescript/ai/src/activities/generateAudio/index.ts
  • packages/typescript/ai/src/activities/generateImage/adapter.ts
  • packages/typescript/ai/src/activities/generateSpeech/adapter.ts
  • packages/typescript/ai/src/activities/generateSpeech/index.ts
  • packages/typescript/ai/src/activities/generateTranscription/adapter.ts
  • packages/typescript/ai/src/activities/generateTranscription/index.ts
  • packages/typescript/ai/tests/image-per-model-type-safety.test.ts
✅ Files skipped from review due to trivial changes (2)
  • examples/ts-react-chat/src/routes/index.tsx
  • packages/typescript/ai-openrouter/tests/image-adapter.test.ts
🚧 Files skipped from review as they are similar to previous changes (12)
  • packages/typescript/ai-fal/src/adapters/image.ts
  • packages/typescript/ai-client/src/generation-types.ts
  • packages/typescript/ai-fal/tests/image-adapter.test.ts
  • examples/ts-react-chat/src/routes/api.generate.audio.ts
  • packages/typescript/ai-grok/src/adapters/image.ts
  • packages/typescript/ai-gemini/src/adapters/image.ts
  • packages/typescript/ai/src/activities/generateSpeech/index.ts
  • packages/typescript/ai-gemini/tests/audio-adapter.test.ts
  • packages/typescript/ai-openrouter/src/adapters/image.ts
  • packages/typescript/ai-fal/src/adapters/speech.ts
  • packages/typescript/ai-gemini/src/adapters/tts.ts
  • packages/typescript/ai/src/activities/generateAudio/index.ts

Comment on lines +44 to +51
const parsed = SPEECH_BODY_SCHEMA.safeParse(rawData)
if (!parsed.success) {
return jsonError(400, {
error: 'validation_failed',
message: 'Request data failed validation',
details: parsed.error.flatten(),
})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find other occurrences of the deprecated ZodError.flatten() pattern across examples + packages.
rg -nP --type=ts -C1 '\.error\.flatten\s*\('

Repository: TanStack/ai

Length of output: 921


🏁 Script executed:

find . -name "package.json" -type f | head -10

Repository: TanStack/ai

Length of output: 432


🏁 Script executed:

cat package.json

Repository: TanStack/ai

Length of output: 3931


🏁 Script executed:

cat ./examples/ts-react-chat/package.json

Repository: TanStack/ai

Length of output: 2072


🏁 Script executed:

cat -n ./examples/ts-react-chat/src/routes/api.generate.speech.ts

Repository: TanStack/ai

Length of output: 2819


🌐 Web query:

Zod 4 ZodError flatten() deprecated migration guide

💡 Result:

In Zod 4, the .flatten method on ZodError instances is deprecated. The official migration guide recommends replacing it with the top-level z.treeifyError(error) function. Key changes from the Zod 4 migration guide: - ZodError.format is deprecated; use z.treeifyError instead. - ZodError.flatten is deprecated; use z.treeifyError instead. - ZodError.formErrors is dropped (it was identical to .flatten). However, the error formatting documentation distinguishes use cases: - For nested schemas: Use z.treeifyError(error) to get a nested tree structure mirroring the schema (e.g., { errors: [], properties: { field: { errors: [] } } }). - For flat schemas (most common): Use z.flattenError(error) to get a shallow object with formErrors: string[] (top-level) and fieldErrors: Record<string, string[]>. Example migration: v3: const result = schema.safeParse(data); if (!result.success) { const flattened = result.error.flatten; // { formErrors: [], fieldErrors: { field: [] } } } v4: import * as z from 'zod'; const result = schema.safeParse(data); if (!result.success) { // For flat errors (replacement for .flatten): const flattened = z.flattenError(result.error); // { formErrors: string[], fieldErrors: Record<string, string[]> } // Or for nested/tree structure: const tree = z.treeifyError(result.error); // Nested object } Note: z.flattenError provides the direct functional equivalent for flat error handling (common in forms), while z.treeifyError is the general replacement emphasized in the migration guide for deprecated instance methods. Both are top-level functions, not on the error instance. Official docs: - Migration: https://zod.dev/v4/changelog (or v4.zod.dev/v4/changelog) - Formatting: https://zod.dev/error-formatting

Citations:


Replace deprecated ZodError.flatten() with z.flattenError().

In Zod 4, .flatten() on ZodError instances is deprecated. For this flat schema, use the top-level z.flattenError(parsed.error) function instead. The same fix applies to api.generate.audio.ts and api.transcribe.ts.

♻️ Suggested fix
         const parsed = SPEECH_BODY_SCHEMA.safeParse(rawData)
         if (!parsed.success) {
           return jsonError(400, {
             error: 'validation_failed',
             message: 'Request data failed validation',
-            details: parsed.error.flatten(),
+            details: z.flattenError(parsed.error),
           })
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const parsed = SPEECH_BODY_SCHEMA.safeParse(rawData)
if (!parsed.success) {
return jsonError(400, {
error: 'validation_failed',
message: 'Request data failed validation',
details: parsed.error.flatten(),
})
}
const parsed = SPEECH_BODY_SCHEMA.safeParse(rawData)
if (!parsed.success) {
return jsonError(400, {
error: 'validation_failed',
message: 'Request data failed validation',
details: z.flattenError(parsed.error),
})
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/api.generate.speech.ts` around lines 44 -
51, The code uses parsed.error.flatten(), which is deprecated; replace that call
with the top-level z.flattenError(parsed.error) and pass its result into
jsonError instead. Update the validation block around
SPEECH_BODY_SCHEMA.safeParse (the parsed variable) so that when parsed.success
is false you call z.flattenError(parsed.error) and include that flattened result
in the details field passed to jsonError (also update the same pattern in the
handlers for audio and transcribe).

Comment thread examples/ts-react-chat/src/routes/generations.audio.tsx Outdated
protected config: TranscriptionAdapterConfig

constructor(config: TranscriptionAdapterConfig = {}, model: TModel) {
constructor(model: TModel, config: TranscriptionAdapterConfig = {}) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all subclasses of the three reordered base adapters and inspect their super() calls.
rg -nP --type=ts -C2 'extends\s+Base(Transcription|Image|TTS)Adapter\b'
echo '---'
rg -nP --type=ts -C1 '\bsuper\s*\('  -g 'packages/typescript/ai-*/src/adapters/**'

Repository: TanStack/ai

Length of output: 12187


Breaking change to public base-class constructors — requires fixing all in-tree subclasses.

BaseTranscriptionAdapter, BaseImageAdapter, and BaseTTSAdapter are exported from @tanstack/ai/adapters. The constructor argument reorder from (config, model) to (model, config) silently breaks any third-party adapter extending these base classes and calling the old argument order — this.model receives the config object at runtime when TProviderOptions/TModel are structurally compatible.

In-tree subclasses are inconsistently updated. The following adapters still call super() with the old argument order and must be fixed:

  • ai-openai/adapters/summarize.ts (line 42): change super({}, model)super(model, {})
  • ai-openai/adapters/video.ts (line 58): change super(config, model)super(model, config)
  • ai-openai/adapters/text.ts (line 120): change super({}, model)super(model, {})
  • ai-openrouter/adapters/summarize.ts (line 52): change super({}, model)super(model, {})
  • ai-openrouter/adapters/text.ts (line 109): change super({}, model)super(model, {})
  • ai-fal/adapters/video.ts (line 73): change super({}, model)super(model, {})
  • ai-ollama/adapters/text.ts (line 132): change super({}, model)super(model, {})
  • ai-groq/adapters/text.ts (line 90): change super({}, model)super(model, {})
  • ai-grok/adapters/summarize.ts (line 45): change super({}, model)super(model, {})
  • ai-grok/adapters/text.ts (line 85): change super({}, model)super(model, {})
  • ai-gemini/adapters/text.ts (line 114): change super({}, model)super(model, {})
  • ai-anthropic/adapters/summarize.ts (line 52): change super({}, model)super(model, {})
  • ai-anthropic/adapters/text.ts (line 128): change super({}, model)super(model, {})

Ensure the changeset entry documents this breaking change for @tanstack/ai using a minor version bump per project convention, with a migration note for any out-of-tree subclasses.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/src/activities/generateTranscription/adapter.ts` at
line 77, The constructor argument order for
BaseTranscriptionAdapter/BaseImageAdapter/BaseTTSAdapter changed from (config,
model) to (model, config), so update every subclass that still calls super with
the old order to pass the model first and config second (e.g., change super({},
model) → super(model, {}) and super(config, model) → super(model, config)) in
the listed adapters (ai-openai/adapters/summarize.ts, video.ts, text.ts;
ai-openrouter/adapters/summarize.ts, text.ts; ai-fal/adapters/video.ts;
ai-ollama/adapters/text.ts; ai-groq/adapters/text.ts;
ai-grok/adapters/summarize.ts, text.ts; ai-gemini/adapters/text.ts;
ai-anthropic/adapters/summarize.ts, text.ts) and any other in-tree/out-of-tree
subclasses; also add a changeset entry for `@tanstack/ai` with a minor version
bump documenting this breaking constructor change and include a migration note
instructing external adapter authors to swap the constructor argument order to
(model, config).

…speech/transcription adapters

Resolves conflicts across five files from merging origin/main (PR TanStack#467 debug
logging rollout) into the 328-fal-audio-and-speech-support branch. Keeps the
CR fix behaviors from HEAD — devtools *:request:error event emission,
async/await (no .then chains), reordered base adapter constructor args,
single canonical numberOfImages key, broadened PCM detection that excludes
containerized wav subtypes, voice validation with partial voiceConfig
fallback, channels threaded into wrapPcmBase64AsWav, no dead
`mimeType || 'audio/wav'` fallback — while adopting main's logger.request()
/ logger.errors() plumbing around the SDK call.

Also threads the new logger surface through the branch's new adapters that
didn't exist on main: adds `logger: InternalLogger` to AudioGenerationOptions
(matching sibling types), `debug?: DebugOption` to AudioActivityOptions,
resolves it via resolveDebugOption in runGenerateAudio, and wires
logger.request / logger.errors into falAudio, falSpeech, falTranscription,
and geminiAudio so they match the pattern runGenerateImage / runGenerateVideo
already follow on main.
…on, voice validation, etc.

Applies the reviewer-flagged changes that weren't load-bearing for the merge:

- event-client: AudioRequestCompletedEvent.audio is now a mutually-exclusive
  {url; never b64Json} | {b64Json; never url} union so consumers can't read
  both fields simultaneously, mirroring the GeneratedAudio contract in core.
- fal utils: extractUrlExtension now strips URL fragments and trailing
  slashes, parses via the URL API so a TLD like `.com` isn't mistaken for
  an extension, and only inspects the final path segment.
- fal utils: deriveAudioContentType returns `audio/aac` for aac, separated
  from the `m4a`/`mp4` → `audio/mp4` case.
- fal speech: prefer URL-derived extension when deriving `format`, and
  normalize `mpeg` → `mp3` so the field is a usable file extension.
- gemini audio: drop `negativePrompt` (not accepted by GenerateContentConfig)
  and `responseMimeType` (Lyria Clip rejects it, Pro returns MP3 by default)
  from the public provider options surface, and document that the generic
  `duration` option is ignored by Lyria (Clip is fixed at 30s, Pro takes
  duration via the natural-language prompt).
- gemini tts: multiSpeakerVoiceConfig.speakerVoiceConfigs length is now
  validated (1 or 2 speakers), partial user-supplied voiceConfig correctly
  falls back to the standard voice/'Kore' default, parsePcmMimeType tightens
  detection to exclude subtypes containing "wav" so containerized
  `audio/wav;codec=pcm` is no longer re-wrapped, and createGeminiSpeech /
  createGeminiAudio factory functions now spread config before the explicit
  apiKey argument so caller config can't silently override the API key.
- ts-react-chat API routes: replace zod 4's removed `.flatten()` with
  `z.treeifyError()` for validation error details.
- ts-react-chat audio route: `toAudioOutput` returns `null` per the
  `onResult` hook contract instead of throwing synchronously — failures
  are still surfaced via the hook's error state.
- Updates the tests affected by the above behavior changes.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/typescript/ai-fal/src/adapters/image.ts (1)

109-110: ⚠️ Potential issue | 🟡 Minor

Let single-image string payloads reach parseImage.

Line 109 filters data.image to objects only, so { image: "https://..." } will never reach parseImage even though it supports string payloads (line 129-130). The type guard should be removed to let parseImage validate the payload.

Proposed fix
-    else if ('image' in data && data.image && typeof data.image === 'object') {
+    else if ('image' in data) {
       images.push(this.parseImage(data.image))
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-fal/src/adapters/image.ts` around lines 109 - 110, The
guard that only allows object-typed images prevents string image payloads from
reaching parseImage; in the branch handling 'image' in data (the else if that
currently checks 'image' in data && data.image && typeof data.image ===
"object"), remove the typeof check so you call this.parseImage(data.image) for
any non-null data.image value (allowing strings like "https://..."); keep the
existing null/undefined check (data.image) and let parseImage perform
validation/conversion.
🧹 Nitpick comments (3)
examples/ts-react-chat/src/routes/api.generate.audio.ts (1)

10-15: Optional: add bounds to duration to reject obviously invalid values early.

z.number().optional() accepts negatives, NaN, 0, and absurdly large values, which will fail deeper inside the provider with less friendly messages. Since this is an example route, a simple positive-and-finite check would surface bad inputs at the validation boundary:

♻️ Proposed fix
-  duration: z.number().optional(),
+  duration: z.number().positive().finite().optional(),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/api.generate.audio.ts` around lines 10 -
15, The duration field in AUDIO_BODY_SCHEMA currently uses z.number().optional()
which allows negatives, 0, NaN, and huge values; update the schema so duration
enforces positive, finite numbers (for example replace z.number().optional()
with z.number().optional().positive().finite(), and optionally add a sensible
.max(...) if you want an upper bound) so invalid durations are rejected at
validation (see AUDIO_BODY_SCHEMA and the duration property).
packages/typescript/ai-openrouter/src/adapters/image.ts (1)

170-170: Minor: consider placing apiKey after the spread to match the PR's stated "protect apiKey ordering" intent.

{ apiKey, ...config } is type-safe today because config is Omit<OpenRouterImageConfig, 'apiKey'>, but the ordering is the opposite of defensive — if the Omit is ever loosened or the call site is bypassed (e.g. via as any), a stray apiKey in config would silently override the caller-supplied one. Inverting to { ...config, apiKey } makes the guarantee runtime-level as well.

♻️ Proposed fix
-  return new OpenRouterImageAdapter({ apiKey, ...config }, model)
+  return new OpenRouterImageAdapter({ ...config, apiKey }, model)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-openrouter/src/adapters/image.ts` at line 170, The
object spread currently passes apiKey before config in the
OpenRouterImageAdapter constructor call, which allows a stray apiKey inside
config to override the explicit apiKey at runtime; update the call in the file
where OpenRouterImageAdapter is constructed (the return new
OpenRouterImageAdapter(..., model) expression) to pass `{ ...config, apiKey }`
instead of `{ apiKey, ...config }`, referencing the OpenRouterImageConfig type
and the apiKey/config variables so the explicit apiKey always wins at runtime.
packages/typescript/ai-fal/src/adapters/audio.ts (1)

111-140: Current implementation correctly handles documented fal response shapes; array handling is optional if needed for future models.

The code properly documents and handles the two known fal audio response formats: audio: { url, content_type } and audio_url: string. Test coverage confirms both shapes work correctly, and the error thrown when neither is found is safe and clear. While array-shaped responses (e.g., audio: [...]) would currently fall through and throw, there is no evidence that any fal audio models currently return arrays. Adding array support would only be necessary if future models adopt that response format.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-fal/src/adapters/audio.ts` around lines 111 - 140,
Summary: transformResponse correctly handles documented fal shapes but should
explicitly document and handle array-shaped audio responses. Update the
transformResponse method to keep the current handling for audio: { url,
content_type } and audio_url, but add an explicit branch that detects
Array.isArray(data.audio) and either handle the first element (e.g., take
element[0].url) or throw a clearer error message stating "array-shaped audio
responses are not supported" depending on the chosen behavior; reference the
local variables audioUrl and contentType, and ensure the returned object still
uses this.generateId(), this.model, and deriveAudioContentType(contentType,
audioUrl). Also add a short inline comment explaining why arrays are treated
specially so future maintainers know the decision.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/ts-react-chat/src/routes/generations.audio.tsx`:
- Around line 1-12: Reorder and convert imports to satisfy import/order and
consistent type-only style: group external libs first (react hooks, `@tanstack/`*
packages), then local imports; use "import type" for pure type imports like
UseGenerateAudioReturn, AudioGenerationResult, AudioProviderConfig,
AudioProviderId; keep value imports (useGenerateAudio, fetchServerSentEvents,
createFileRoute, generateAudioFn, AUDIO_PROVIDERS) as regular imports; ensure
React named imports (useEffect, useMemo, useRef, useState) remain together and
before local project imports to match the lint rules.
- Around line 137-150: The effect only revokes when result.url becomes falsy, so
when result.url changes from a blob: URL to a remote HTTP(S) URL the previous
blob is never revoked; update the useEffect handling of result?.url (the block
that references lastBlobUrlRef and URL.revokeObjectURL) to also revoke and clear
lastBlobUrlRef.current when result.url exists but does not start with 'blob:'
(i.e., add a branch checking current && !current.startsWith('blob:') &&
lastBlobUrlRef.current, or simplify the else-if to revoke whenever
lastBlobUrlRef.current exists), ensuring lastBlobUrlRef.current is set to null
after revoke.

In `@packages/typescript/ai-fal/src/adapters/speech.ts`:
- Around line 120-137: The returned contentType may be too loose when only
audio_url is present—keep format as the file extension (derived by
extractUrlExtension and normalized like 'mpeg'→'mp3') but map common extensions
to proper MIME types for strict consumers; update the logic around
contentTypeMime/format in the function that builds the return object
(references: extractUrlExtension, rawFormat, format, generateId) so that if
contentTypeMime is missing you look up a small map (e.g., mp3→audio/mpeg,
m4a→audio/mp4, wav→audio/wav, ogg→audio/ogg, webm→audio/webm, etc.) and set
contentType to that mapped MIME, otherwise fall back to `audio/${format}`.

In `@packages/typescript/ai-gemini/src/adapters/image.ts`:
- Around line 238-265: The current check only throws when entries.length > 0,
allowing a completely empty or malformed Imagen response (generatedImages
undefined/null/empty) to return images: []; update the logic in the image
adapter handling (the block that builds entries, images, filterReasons and the
throw for Imagen) so that whenever images.length === 0 you throw the same Error
(not just when entries.length > 0); adjust the thrown message to still include
the joined filterReasons when present and reference the model variable as before
(i.e., ensure the throw happens if no usable images were produced regardless of
generatedImages length).

In `@packages/typescript/ai-gemini/src/adapters/tts.ts`:
- Around line 288-321: parsePcmMimeType currently returns bitsPerSample for any
audio/L<N> but wrapPcmBase64AsWav throws for non‑16 bits, causing generateSpeech
to fail for L24/L32; modify the caller (where generateSpeech invokes
wrapPcmBase64AsWav) to detect bitsPerSample !== 16 and, instead of calling
wrapPcmBase64AsWav, return the original base64 payload and original contentType
(i.e., leave it unwrapped) so unsupported PCM depths don't error the whole
request; keep parsePcmMimeType as-is and only add this conditional fallback
around the wrapPcmBase64AsWav invocation.

In `@packages/typescript/ai-gemini/tests/audio-adapter.test.ts`:
- Around line 35-42: The test "geminiAudio reads the API key from the
environment" mutates process.env.GOOGLE_API_KEY without restoring its prior
value; fix by capturing the original value (e.g. const originalKey =
process.env.GOOGLE_API_KEY) before setting it, then in the finally block restore
it (if originalKey is undefined delete process.env.GOOGLE_API_KEY else set
process.env.GOOGLE_API_KEY = originalKey). Apply this change around the
geminiAudio test to avoid affecting other tests in the worker.

In `@packages/typescript/ai/src/activities/generateAudio/index.ts`:
- Around line 182-193: The catch block should normalize non-Error throws before
emitting via aiEventClient.emit to avoid runtime crashes when adapters throw
null/undefined; replace the current naive cast (error as Error) with a
normalizedError computed from the caught value (use the Error instance if error
instanceof Error, otherwise create a new Error(String(error)) and set a fallback
name or include the original value in a raw/cause field), then use that
normalizedError's message and name when emitting the 'audio:request:error' event
(still include requestId, provider: adapter.name, model, duration: elapsedMs,
and modelOptions: rest.modelOptions).

In `@packages/typescript/ai/src/activities/generateSpeech/index.ts`:
- Around line 196-207: The catch block that emits the 'speech:request:error'
event currently does a blind type assertion (error as Error) which will throw if
a non-Error (null/undefined/primitive) was thrown; update the handler in the
generateSpeech catch to normalize the thrown value into a safeError object
before calling aiEventClient.emit — e.g., derive message and name using a
helper/local logic that handles Error instances, non-Error objects,
null/undefined, and primitives (falling back to String(error) and a default name
like 'NonErrorThrow'), and then pass that safeError (with message and name)
along with existing fields (requestId, provider: adapter.name, model,
modelOptions: rest.modelOptions, duration, timestamp) so the emit never throws
when formatting the error.

---

Outside diff comments:
In `@packages/typescript/ai-fal/src/adapters/image.ts`:
- Around line 109-110: The guard that only allows object-typed images prevents
string image payloads from reaching parseImage; in the branch handling 'image'
in data (the else if that currently checks 'image' in data && data.image &&
typeof data.image === "object"), remove the typeof check so you call
this.parseImage(data.image) for any non-null data.image value (allowing strings
like "https://..."); keep the existing null/undefined check (data.image) and let
parseImage perform validation/conversion.

---

Nitpick comments:
In `@examples/ts-react-chat/src/routes/api.generate.audio.ts`:
- Around line 10-15: The duration field in AUDIO_BODY_SCHEMA currently uses
z.number().optional() which allows negatives, 0, NaN, and huge values; update
the schema so duration enforces positive, finite numbers (for example replace
z.number().optional() with z.number().optional().positive().finite(), and
optionally add a sensible .max(...) if you want an upper bound) so invalid
durations are rejected at validation (see AUDIO_BODY_SCHEMA and the duration
property).

In `@packages/typescript/ai-fal/src/adapters/audio.ts`:
- Around line 111-140: Summary: transformResponse correctly handles documented
fal shapes but should explicitly document and handle array-shaped audio
responses. Update the transformResponse method to keep the current handling for
audio: { url, content_type } and audio_url, but add an explicit branch that
detects Array.isArray(data.audio) and either handle the first element (e.g.,
take element[0].url) or throw a clearer error message stating "array-shaped
audio responses are not supported" depending on the chosen behavior; reference
the local variables audioUrl and contentType, and ensure the returned object
still uses this.generateId(), this.model, and
deriveAudioContentType(contentType, audioUrl). Also add a short inline comment
explaining why arrays are treated specially so future maintainers know the
decision.

In `@packages/typescript/ai-openrouter/src/adapters/image.ts`:
- Line 170: The object spread currently passes apiKey before config in the
OpenRouterImageAdapter constructor call, which allows a stray apiKey inside
config to override the explicit apiKey at runtime; update the call in the file
where OpenRouterImageAdapter is constructed (the return new
OpenRouterImageAdapter(..., model) expression) to pass `{ ...config, apiKey }`
instead of `{ apiKey, ...config }`, referencing the OpenRouterImageConfig type
and the apiKey/config variables so the explicit apiKey always wins at runtime.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3593383a-5920-4b45-8f02-0b46581be904

📥 Commits

Reviewing files that changed from the base of the PR and between 8957b41 and 8d5fb93.

📒 Files selected for processing (31)
  • docs/config.json
  • examples/ts-react-chat/src/routes/api.generate.audio.ts
  • examples/ts-react-chat/src/routes/api.generate.speech.ts
  • examples/ts-react-chat/src/routes/api.transcribe.ts
  • examples/ts-react-chat/src/routes/generations.audio.tsx
  • packages/typescript/ai-event-client/src/index.ts
  • packages/typescript/ai-fal/package.json
  • packages/typescript/ai-fal/src/adapters/audio.ts
  • packages/typescript/ai-fal/src/adapters/image.ts
  • packages/typescript/ai-fal/src/adapters/speech.ts
  • packages/typescript/ai-fal/src/adapters/transcription.ts
  • packages/typescript/ai-fal/src/utils/client.ts
  • packages/typescript/ai-fal/tests/speech-adapter.test.ts
  • packages/typescript/ai-fal/tests/utils.test.ts
  • packages/typescript/ai-gemini/src/adapters/audio.ts
  • packages/typescript/ai-gemini/src/adapters/image.ts
  • packages/typescript/ai-gemini/src/adapters/tts.ts
  • packages/typescript/ai-gemini/tests/audio-adapter.test.ts
  • packages/typescript/ai-gemini/tests/image-adapter.test.ts
  • packages/typescript/ai-gemini/tests/tts-adapter.test.ts
  • packages/typescript/ai-grok/src/adapters/image.ts
  • packages/typescript/ai-openai/src/adapters/image.ts
  • packages/typescript/ai-openai/src/adapters/transcription.ts
  • packages/typescript/ai-openai/src/adapters/tts.ts
  • packages/typescript/ai-openrouter/src/adapters/image.ts
  • packages/typescript/ai-openrouter/tests/image-adapter.test.ts
  • packages/typescript/ai/src/activities/generateAudio/index.ts
  • packages/typescript/ai/src/activities/generateSpeech/index.ts
  • packages/typescript/ai/src/activities/generateTranscription/index.ts
  • packages/typescript/ai/src/index.ts
  • packages/typescript/ai/src/types.ts
✅ Files skipped from review due to trivial changes (4)
  • docs/config.json
  • packages/typescript/ai-fal/package.json
  • packages/typescript/ai-openrouter/tests/image-adapter.test.ts
  • packages/typescript/ai-fal/tests/speech-adapter.test.ts
🚧 Files skipped from review as they are similar to previous changes (8)
  • packages/typescript/ai-openai/src/adapters/tts.ts
  • packages/typescript/ai/src/index.ts
  • packages/typescript/ai-openai/src/adapters/image.ts
  • packages/typescript/ai-grok/src/adapters/image.ts
  • packages/typescript/ai-openai/src/adapters/transcription.ts
  • packages/typescript/ai-fal/src/adapters/transcription.ts
  • packages/typescript/ai-gemini/tests/tts-adapter.test.ts
  • packages/typescript/ai-gemini/src/adapters/audio.ts

Comment on lines +1 to +12
import { useEffect, useMemo, useRef, useState } from 'react'
import { createFileRoute } from '@tanstack/react-router'
import { useGenerateAudio } from '@tanstack/ai-react'
import type { UseGenerateAudioReturn } from '@tanstack/ai-react'
import { fetchServerSentEvents } from '@tanstack/ai-client'
import type { AudioGenerationResult } from '@tanstack/ai'
import { generateAudioFn } from '../lib/server-fns'
import {
AUDIO_PROVIDERS,
type AudioProviderConfig,
type AudioProviderId,
} from '../lib/audio-providers'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the lint-required import ordering and type-only imports.

Static analysis reports import/order and import/consistent-type-specifier-style errors here.

🧹 Proposed fix
 import { useEffect, useMemo, useRef, useState } from 'react'
 import { createFileRoute } from '@tanstack/react-router'
 import { useGenerateAudio } from '@tanstack/ai-react'
-import type { UseGenerateAudioReturn } from '@tanstack/ai-react'
 import { fetchServerSentEvents } from '@tanstack/ai-client'
-import type { AudioGenerationResult } from '@tanstack/ai'
 import { generateAudioFn } from '../lib/server-fns'
-import {
-  AUDIO_PROVIDERS,
-  type AudioProviderConfig,
-  type AudioProviderId,
-} from '../lib/audio-providers'
+import { AUDIO_PROVIDERS } from '../lib/audio-providers'
+import type { AudioGenerationResult } from '@tanstack/ai'
+import type { UseGenerateAudioReturn } from '@tanstack/ai-react'
+import type {
+  AudioProviderConfig,
+  AudioProviderId,
+} from '../lib/audio-providers'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import { useEffect, useMemo, useRef, useState } from 'react'
import { createFileRoute } from '@tanstack/react-router'
import { useGenerateAudio } from '@tanstack/ai-react'
import type { UseGenerateAudioReturn } from '@tanstack/ai-react'
import { fetchServerSentEvents } from '@tanstack/ai-client'
import type { AudioGenerationResult } from '@tanstack/ai'
import { generateAudioFn } from '../lib/server-fns'
import {
AUDIO_PROVIDERS,
type AudioProviderConfig,
type AudioProviderId,
} from '../lib/audio-providers'
import { useEffect, useMemo, useRef, useState } from 'react'
import { createFileRoute } from '@tanstack/react-router'
import { useGenerateAudio } from '@tanstack/ai-react'
import { fetchServerSentEvents } from '@tanstack/ai-client'
import { generateAudioFn } from '../lib/server-fns'
import { AUDIO_PROVIDERS } from '../lib/audio-providers'
import type { AudioGenerationResult } from '@tanstack/ai'
import type { UseGenerateAudioReturn } from '@tanstack/ai-react'
import type {
AudioProviderConfig,
AudioProviderId,
} from '../lib/audio-providers'
🧰 Tools
🪛 ESLint

[error] 4-4: @tanstack/ai-react type import should occur after import of ../lib/audio-providers

(import/order)


[error] 6-6: @tanstack/ai type import should occur after import of ../lib/audio-providers

(import/order)


[error] 10-10: Prefer using a top-level type-only import instead of inline type specifiers.

(import/consistent-type-specifier-style)


[error] 11-11: Prefer using a top-level type-only import instead of inline type specifiers.

(import/consistent-type-specifier-style)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/generations.audio.tsx` around lines 1 - 12,
Reorder and convert imports to satisfy import/order and consistent type-only
style: group external libs first (react hooks, `@tanstack/`* packages), then local
imports; use "import type" for pure type imports like UseGenerateAudioReturn,
AudioGenerationResult, AudioProviderConfig, AudioProviderId; keep value imports
(useGenerateAudio, fetchServerSentEvents, createFileRoute, generateAudioFn,
AUDIO_PROVIDERS) as regular imports; ensure React named imports (useEffect,
useMemo, useRef, useState) remain together and before local project imports to
match the lint rules.

Comment on lines +137 to +150
useEffect(() => {
const current = result?.url
// Only track blob: URLs — remote URLs returned directly by providers
// are not ours to revoke.
if (current && current.startsWith('blob:')) {
if (lastBlobUrlRef.current && lastBlobUrlRef.current !== current) {
URL.revokeObjectURL(lastBlobUrlRef.current)
}
lastBlobUrlRef.current = current
} else if (!current && lastBlobUrlRef.current) {
URL.revokeObjectURL(lastBlobUrlRef.current)
lastBlobUrlRef.current = null
}
}, [result?.url])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Revoke blob URLs when the next result is remote.

The current effect skips revocation when result.url changes from a blob: URL to an HTTP(S) URL, retaining the previous audio blob for the component lifetime.

🧯 Proposed fix
   useEffect(() => {
     const current = result?.url
-    // Only track blob: URLs — remote URLs returned directly by providers
-    // are not ours to revoke.
-    if (current && current.startsWith('blob:')) {
-      if (lastBlobUrlRef.current && lastBlobUrlRef.current !== current) {
-        URL.revokeObjectURL(lastBlobUrlRef.current)
-      }
+    const previous = lastBlobUrlRef.current
+
+    if (previous && previous !== current) {
+      URL.revokeObjectURL(previous)
+      lastBlobUrlRef.current = null
+    }
+
+    // Only track blob: URLs — remote URLs returned directly by providers
+    // are not ours to revoke.
+    if (current?.startsWith('blob:')) {
       lastBlobUrlRef.current = current
-    } else if (!current && lastBlobUrlRef.current) {
-      URL.revokeObjectURL(lastBlobUrlRef.current)
-      lastBlobUrlRef.current = null
     }
   }, [result?.url])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
useEffect(() => {
const current = result?.url
// Only track blob: URLs — remote URLs returned directly by providers
// are not ours to revoke.
if (current && current.startsWith('blob:')) {
if (lastBlobUrlRef.current && lastBlobUrlRef.current !== current) {
URL.revokeObjectURL(lastBlobUrlRef.current)
}
lastBlobUrlRef.current = current
} else if (!current && lastBlobUrlRef.current) {
URL.revokeObjectURL(lastBlobUrlRef.current)
lastBlobUrlRef.current = null
}
}, [result?.url])
useEffect(() => {
const current = result?.url
const previous = lastBlobUrlRef.current
if (previous && previous !== current) {
URL.revokeObjectURL(previous)
lastBlobUrlRef.current = null
}
// Only track blob: URLs — remote URLs returned directly by providers
// are not ours to revoke.
if (current?.startsWith('blob:')) {
lastBlobUrlRef.current = current
}
}, [result?.url])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/generations.audio.tsx` around lines 137 -
150, The effect only revokes when result.url becomes falsy, so when result.url
changes from a blob: URL to a remote HTTP(S) URL the previous blob is never
revoked; update the useEffect handling of result?.url (the block that references
lastBlobUrlRef and URL.revokeObjectURL) to also revoke and clear
lastBlobUrlRef.current when result.url exists but does not start with 'blob:'
(i.e., add a branch checking current && !current.startsWith('blob:') &&
lastBlobUrlRef.current, or simplify the else-if to revoke whenever
lastBlobUrlRef.current exists), ensuring lastBlobUrlRef.current is set to null
after revoke.

Comment on lines +120 to +137
// Strip parameters like `; charset=...` from contentType, and only use
// the URL extension as a fallback when it looks like a real extension.
const contentTypeMime = contentType?.split(';')[0]?.trim()
const safeUrlExtension = extractUrlExtension(audioUrl)
// Prefer URL-derived extension when available (more canonical for file
// consumers), otherwise derive from the content-type mime subtype, then
// fall back to `wav`. Normalize `mpeg` → `mp3` so the format field is a
// usable file extension rather than the IANA subtype.
const rawFormat =
safeUrlExtension || contentTypeMime?.split('/')[1] || 'wav'
const format = rawFormat === 'mpeg' ? 'mp3' : rawFormat

return {
id: response.requestId || this.generateId(),
model: this.model,
audio: base64,
format,
contentType: contentTypeMime || `audio/${format}`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Map extension fallbacks to MIME types.

When fal returns only audio_url, contentType can become values like audio/mp3 or audio/m4a. Keep format as the file extension, but map common extensions to their MIME types for stricter consumers.

🎧 Proposed fix
     const rawFormat =
       safeUrlExtension || contentTypeMime?.split('/')[1] || 'wav'
     const format = rawFormat === 'mpeg' ? 'mp3' : rawFormat
+    const contentTypeByFormat: Record<string, string> = {
+      mp3: 'audio/mpeg',
+      m4a: 'audio/mp4',
+      mp4: 'audio/mp4',
+      wav: 'audio/wav',
+    }
 
     return {
       id: response.requestId || this.generateId(),
       model: this.model,
       audio: base64,
       format,
-      contentType: contentTypeMime || `audio/${format}`,
+      contentType:
+        contentTypeMime || contentTypeByFormat[format] || `audio/${format}`,
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-fal/src/adapters/speech.ts` around lines 120 - 137,
The returned contentType may be too loose when only audio_url is present—keep
format as the file extension (derived by extractUrlExtension and normalized like
'mpeg'→'mp3') but map common extensions to proper MIME types for strict
consumers; update the logic around contentTypeMime/format in the function that
builds the return object (references: extractUrlExtension, rawFormat, format,
generateId) so that if contentTypeMime is missing you look up a small map (e.g.,
mp3→audio/mpeg, m4a→audio/mp4, wav→audio/wav, ogg→audio/ogg, webm→audio/webm,
etc.) and set contentType to that mapped MIME, otherwise fall back to
`audio/${format}`.

Comment on lines +238 to +265
const entries = response.generatedImages ?? []
const images: Array<GeneratedImage> = []
const filterReasons: Array<string> = []

for (const item of entries) {
const b64Json = item.image?.imageBytes
if (b64Json) {
images.push({ b64Json, revisedPrompt: item.enhancedPrompt })
continue
}
// Imagen can drop individual entries with a raiFilteredReason when
// Responsible-AI filters fire. Preserve the reason so callers can
// surface it instead of silently getting back fewer images.
const reason = (item as { raiFilteredReason?: string }).raiFilteredReason
if (reason) {
filterReasons.push(reason)
}
}

// Every entry was filtered — no usable images to return. Throw rather
// than resolve to an empty array so the caller is forced to handle the
// failure mode explicitly.
if (entries.length > 0 && images.length === 0) {
const joined = filterReasons.length > 0 ? filterReasons.join('; ') : ''
throw new Error(
`Imagen ${model} returned no images: all ${entries.length} generated image(s) were filtered by Responsible-AI${joined ? ` (${joined})` : ''}.`,
)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Also reject completely empty Imagen responses.

Line 260 only throws when entries.length > 0, so a malformed/empty generatedImages response still resolves with images: []. That recreates the silent-success failure mode fixed for Gemini-native responses.

🐛 Proposed fix
-    if (entries.length > 0 && images.length === 0) {
+    if (images.length === 0) {
+      if (entries.length === 0) {
+        throw new Error(`Imagen ${model} returned no images.`)
+      }
+
       const joined = filterReasons.length > 0 ? filterReasons.join('; ') : ''
       throw new Error(
         `Imagen ${model} returned no images: all ${entries.length} generated image(s) were filtered by Responsible-AI${joined ? ` (${joined})` : ''}.`,
       )
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/adapters/image.ts` around lines 238 - 265,
The current check only throws when entries.length > 0, allowing a completely
empty or malformed Imagen response (generatedImages undefined/null/empty) to
return images: []; update the logic in the image adapter handling (the block
that builds entries, images, filterReasons and the throw for Imagen) so that
whenever images.length === 0 you throw the same Error (not just when
entries.length > 0); adjust the thrown message to still include the joined
filterReasons when present and reference the model variable as before (i.e.,
ensure the throw happens if no usable images were produced regardless of
generatedImages length).

Comment on lines +288 to +321
const bitDepthMatch = /^audio\/l(\d+)/.exec(normalized)
const isPcm =
bitDepthMatch !== null ||
normalized.startsWith('audio/pcm') ||
normalized.startsWith('audio/x-pcm') ||
normalized.includes('codec=pcm')
if (!isPcm) return undefined

const rateMatch = /rate=(\d+)/.exec(normalized)
const channelsMatch = /channels=(\d+)/.exec(normalized)
// Default to 16-bit when the mime type doesn't specify — matches Gemini's
// audio/L16;codec=pcm;rate=24000 response.
const bitsPerSample = bitDepthMatch ? Number(bitDepthMatch[1]) : 16
return {
sampleRate: rateMatch ? Number(rateMatch[1]) : 24000,
channels: channelsMatch ? Number(channelsMatch[1]) : 1,
bitsPerSample,
}
}

function wrapPcmBase64AsWav(
pcmBase64: string,
sampleRate: number,
channels = 1,
bitsPerSample = 16,
): string {
// The WAV writer below emits a 16-bit PCM fmt chunk. If the source claims a
// different bit depth we'd be lying about the payload, so bail out loudly
// rather than producing a corrupt file.
if (bitsPerSample !== 16) {
throw new Error(
`Unsupported PCM bit depth ${bitsPerSample}: only 16-bit PCM can be wrapped as WAV.`,
)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Non‑16‑bit PCM mime types now hard‑fail the request.

parsePcmMimeType accepts any audio/L<N> (e.g. audio/L24, audio/L32) and returns bitsPerSample = N, but wrapPcmBase64AsWav throws Unsupported PCM bit depth … for anything other than 16. That error propagates out of generateSpeech, so if Gemini (or a future model) ever returns non‑16‑bit PCM the entire call errors instead of returning the raw bytes. Today Lyria/TTS return L16 so this is latent, but it's a trivially avoidable footgun.

Prefer one of:

  • Restrict parsePcmMimeType to 16‑bit PCM so unsupported depths fall through to the generic mimeType.split('/') path and are returned as‑is, or
  • Catch the unsupported‑depth case at the call site and return the un‑wrapped base64 with the original contentType.
🛡️ Proposed fix (option 1 — tightens the detector)
-  // Default to 16-bit when the mime type doesn't specify — matches Gemini's
-  // audio/L16;codec=pcm;rate=24000 response.
-  const bitsPerSample = bitDepthMatch ? Number(bitDepthMatch[1]) : 16
+  // Default to 16-bit when the mime type doesn't specify — matches Gemini's
+  // audio/L16;codec=pcm;rate=24000 response. Only 16-bit PCM can be wrapped
+  // as WAV by the helper below, so bail out for other bit depths and let the
+  // caller fall through to returning the raw payload with its original mime.
+  const bitsPerSample = bitDepthMatch ? Number(bitDepthMatch[1]) : 16
+  if (bitsPerSample !== 16) return undefined
   return {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/adapters/tts.ts` around lines 288 - 321,
parsePcmMimeType currently returns bitsPerSample for any audio/L<N> but
wrapPcmBase64AsWav throws for non‑16 bits, causing generateSpeech to fail for
L24/L32; modify the caller (where generateSpeech invokes wrapPcmBase64AsWav) to
detect bitsPerSample !== 16 and, instead of calling wrapPcmBase64AsWav, return
the original base64 payload and original contentType (i.e., leave it unwrapped)
so unsupported PCM depths don't error the whole request; keep parsePcmMimeType
as-is and only add this conditional fallback around the wrapPcmBase64AsWav
invocation.

Comment on lines +35 to +42
it('geminiAudio reads the API key from the environment', () => {
process.env.GOOGLE_API_KEY = 'env-key'
try {
const adapter = geminiAudio('lyria-3-clip-preview')
expect(adapter.model).toBe('lyria-3-clip-preview')
} finally {
delete process.env.GOOGLE_API_KEY
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Restore the previous GOOGLE_API_KEY value.

This test deletes the env var even when it existed before the test, which can affect later tests in the same worker.

🧪 Proposed fix
   it('geminiAudio reads the API key from the environment', () => {
+    const previousGoogleApiKey = process.env.GOOGLE_API_KEY
     process.env.GOOGLE_API_KEY = 'env-key'
     try {
       const adapter = geminiAudio('lyria-3-clip-preview')
       expect(adapter.model).toBe('lyria-3-clip-preview')
     } finally {
-      delete process.env.GOOGLE_API_KEY
+      if (previousGoogleApiKey === undefined) {
+        delete process.env.GOOGLE_API_KEY
+      } else {
+        process.env.GOOGLE_API_KEY = previousGoogleApiKey
+      }
     }
   })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it('geminiAudio reads the API key from the environment', () => {
process.env.GOOGLE_API_KEY = 'env-key'
try {
const adapter = geminiAudio('lyria-3-clip-preview')
expect(adapter.model).toBe('lyria-3-clip-preview')
} finally {
delete process.env.GOOGLE_API_KEY
}
it('geminiAudio reads the API key from the environment', () => {
const previousGoogleApiKey = process.env.GOOGLE_API_KEY
process.env.GOOGLE_API_KEY = 'env-key'
try {
const adapter = geminiAudio('lyria-3-clip-preview')
expect(adapter.model).toBe('lyria-3-clip-preview')
} finally {
if (previousGoogleApiKey === undefined) {
delete process.env.GOOGLE_API_KEY
} else {
process.env.GOOGLE_API_KEY = previousGoogleApiKey
}
}
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/tests/audio-adapter.test.ts` around lines 35 -
42, The test "geminiAudio reads the API key from the environment" mutates
process.env.GOOGLE_API_KEY without restoring its prior value; fix by capturing
the original value (e.g. const originalKey = process.env.GOOGLE_API_KEY) before
setting it, then in the finally block restore it (if originalKey is undefined
delete process.env.GOOGLE_API_KEY else set process.env.GOOGLE_API_KEY =
originalKey). Apply this change around the geminiAudio test to avoid affecting
other tests in the worker.

Comment on lines +182 to +193
} catch (error) {
const elapsedMs = Date.now() - startTime
const err = error as Error
aiEventClient.emit('audio:request:error', {
requestId,
provider: adapter.name,
model,
error: { message: err.message, name: err.name },
duration: elapsedMs,
modelOptions: rest.modelOptions as Record<string, unknown> | undefined,
timestamp: Date.now(),
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize non-Error throws before emitting error events.

error as Error does not guard at runtime; a null/undefined throw from an adapter would throw again at err.message and replace the original failure.

🐛 Proposed fix
     const elapsedMs = Date.now() - startTime
-    const err = error as Error
+    const err =
+      error instanceof Error
+        ? { message: error.message, name: error.name }
+        : { message: String(error) }
     aiEventClient.emit('audio:request:error', {
       requestId,
       provider: adapter.name,
       model,
-      error: { message: err.message, name: err.name },
+      error: err,
       duration: elapsedMs,
       modelOptions: rest.modelOptions as Record<string, unknown> | undefined,
       timestamp: Date.now(),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} catch (error) {
const elapsedMs = Date.now() - startTime
const err = error as Error
aiEventClient.emit('audio:request:error', {
requestId,
provider: adapter.name,
model,
error: { message: err.message, name: err.name },
duration: elapsedMs,
modelOptions: rest.modelOptions as Record<string, unknown> | undefined,
timestamp: Date.now(),
})
} catch (error) {
const elapsedMs = Date.now() - startTime
const err =
error instanceof Error
? { message: error.message, name: error.name }
: { message: String(error) }
aiEventClient.emit('audio:request:error', {
requestId,
provider: adapter.name,
model,
error: err,
duration: elapsedMs,
modelOptions: rest.modelOptions as Record<string, unknown> | undefined,
timestamp: Date.now(),
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/src/activities/generateAudio/index.ts` around lines
182 - 193, The catch block should normalize non-Error throws before emitting via
aiEventClient.emit to avoid runtime crashes when adapters throw null/undefined;
replace the current naive cast (error as Error) with a normalizedError computed
from the caught value (use the Error instance if error instanceof Error,
otherwise create a new Error(String(error)) and set a fallback name or include
the original value in a raw/cause field), then use that normalizedError's
message and name when emitting the 'audio:request:error' event (still include
requestId, provider: adapter.name, model, duration: elapsedMs, and modelOptions:
rest.modelOptions).

Comment on lines 196 to +207
} catch (error) {
const duration = Date.now() - startTime
const err = error as Error
aiEventClient.emit('speech:request:error', {
requestId,
provider: adapter.name,
model,
error: { message: err.message, name: err.name },
duration,
modelOptions: rest.modelOptions as Record<string, unknown> | undefined,
timestamp: Date.now(),
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize non-Error throws before emitting error events.

error as Error is only a type assertion; if an adapter/SDK throws null or undefined, err.message throws inside the catch block and masks the original failure.

🐛 Proposed fix
-    const err = error as Error
+    const err =
+      error instanceof Error
+        ? { message: error.message, name: error.name }
+        : { message: String(error) }
     aiEventClient.emit('speech:request:error', {
       requestId,
       provider: adapter.name,
       model,
-      error: { message: err.message, name: err.name },
+      error: err,
       duration,
       modelOptions: rest.modelOptions as Record<string, unknown> | undefined,
       timestamp: Date.now(),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} catch (error) {
const duration = Date.now() - startTime
const err = error as Error
aiEventClient.emit('speech:request:error', {
requestId,
provider: adapter.name,
model,
error: { message: err.message, name: err.name },
duration,
modelOptions: rest.modelOptions as Record<string, unknown> | undefined,
timestamp: Date.now(),
})
} catch (error) {
const duration = Date.now() - startTime
const err =
error instanceof Error
? { message: error.message, name: error.name }
: { message: String(error) }
aiEventClient.emit('speech:request:error', {
requestId,
provider: adapter.name,
model,
error: err,
duration,
modelOptions: rest.modelOptions as Record<string, unknown> | undefined,
timestamp: Date.now(),
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/src/activities/generateSpeech/index.ts` around lines
196 - 207, The catch block that emits the 'speech:request:error' event currently
does a blind type assertion (error as Error) which will throw if a non-Error
(null/undefined/primitive) was thrown; update the handler in the generateSpeech
catch to normalize the thrown value into a safeError object before calling
aiEventClient.emit — e.g., derive message and name using a helper/local logic
that handles Error instances, non-Error objects, null/undefined, and primitives
(falling back to String(error) and a default name like 'NonErrorThrow'), and
then pass that safeError (with message and name) along with existing fields
(requestId, provider: adapter.name, model, modelOptions: rest.modelOptions,
duration, timestamp) so the emit never throws when formatting the error.

…ities

- debug-logging.md: list generateAudio/generateTranscription in Non-chat
  activities section; clarify that the `provider` category now applies to
  streaming generateAudio/generateSpeech/generateTranscription calls too.
- audio-generation.md, text-to-speech.md, transcription.md: add a single
  contextual callout at the moment a builder is most likely to need it
  (immediately before the Options table / next to Error Handling), pointing
  to the debug-logging guide.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docs/media/audio-generation.md (1)

110-122: Consider clarifying the mutually-exclusive nature of url and b64Json.

While the interface shows both fields as optional, the implementation guarantees exactly one will be present (per the PR objectives). Consider adding a note in the prose to make this constraint explicit for users.

📋 Suggested addition after line 122
 }

+Note: url and b64Json are mutually exclusive — exactly one will be present depending on the provider.

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/media/audio-generation.md around lines 110 - 122, Clarify in the docs
that the AudioGenerationResult interface guarantees exactly one of audio.url or
audio.b64Json will be returned; update the prose near the AudioGenerationResult
definition to state that url and b64Json are mutually exclusive and that exactly
one will be present depending on the provider (reference AudioGenerationResult
and the audio.url/audio.b64Json fields).


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @docs/media/audio-generation.md:

  • Line 102: Update the table row for the adapter (AudioAdapter) to clarify
    it applies to all audio adapters, not just falAudio()—mention geminiAudio()
    and other supported adapters (e.g., "The adapter created via falAudio(),
    geminiAudio(), or other AudioAdapter factories") so readers understand any
    AudioAdapter implementation is valid; update the description text where
    adapter and AudioAdapter appear to reflect this broader applicability.

Nitpick comments:
In @docs/media/audio-generation.md:

  • Around line 110-122: Clarify in the docs that the AudioGenerationResult
    interface guarantees exactly one of audio.url or audio.b64Json will be returned;
    update the prose near the AudioGenerationResult definition to state that url and
    b64Json are mutually exclusive and that exactly one will be present depending on
    the provider (reference AudioGenerationResult and the audio.url/audio.b64Json
    fields).

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `baff8e9a-6958-479e-916c-021d661c8ff1`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 8d5fb936682339e0c145af9ea072e0fad204cd8d and f1bdc01c6187477f3ebfbda7abd32ab9dd648882.

</details>

<details>
<summary>📒 Files selected for processing (4)</summary>

* `docs/advanced/debug-logging.md`
* `docs/media/audio-generation.md`
* `docs/media/text-to-speech.md`
* `docs/media/transcription.md`

</details>

<details>
<summary>✅ Files skipped from review due to trivial changes (2)</summary>

* docs/advanced/debug-logging.md
* docs/media/text-to-speech.md

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary>

* docs/media/transcription.md

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->


| Option | Type | Description |
|--------|------|-------------|
| `adapter` | `AudioAdapter` | The adapter created via `falAudio()` (required) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify adapter description to include all audio adapters.

The description mentions only falAudio(), but this options table applies to all audio adapters including geminiAudio().

📝 Proposed fix
-| `adapter` | `AudioAdapter` | The adapter created via `falAudio()` (required) |
+| `adapter` | `AudioAdapter` | The audio adapter (e.g., `falAudio()`, `geminiAudio()`) (required) |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| `adapter` | `AudioAdapter` | The adapter created via `falAudio()` (required) |
| `adapter` | `AudioAdapter` | The audio adapter (e.g., `falAudio()`, `geminiAudio()`) (required) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/media/audio-generation.md` at line 102, Update the table row for the
`adapter` (`AudioAdapter`) to clarify it applies to all audio adapters, not just
`falAudio()`—mention `geminiAudio()` and other supported adapters (e.g., "The
adapter created via falAudio(), geminiAudio(), or other AudioAdapter factories")
so readers understand any AudioAdapter implementation is valid; update the
description text where `adapter` and `AudioAdapter` appear to reflect this
broader applicability.

AlemTuzlak and others added 3 commits April 22, 2026 14:22
…eration skill

Agents hitting the new generateAudio/generateSpeech/generateTranscription
activities will run into:

- Gemini Lyria doesn't accept responseMimeType or negativePrompt via
  GenerateContentConfig — shape the prompt instead.
- Lyria 3 Clip is fixed 30s; Lyria 3 Pro reads duration from natural-language
  in the prompt, not the duration option. fal audio maps duration per-model.
- Gemini TTS multiSpeakerVoiceConfig is validated to 1 or 2 speakers.
- debug: DebugOption is threaded through every generate*() activity — reach
  for it instead of writing logging middleware.

Adds four Common Mistake entries, sources the debug-logging doc, and
cross-references the ai-core/debug-logging sub-skill.
fal-client auto-uploads Blob/File inputs via fal.storage.upload but
passes strings through unchanged, so data URLs reached fal's API and
got rejected with 422 "Unsupported data URL". Decode data URL strings
to a Blob in buildInput so the auto-upload path handles them; plain
http(s) URLs still pass through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@AlemTuzlak AlemTuzlak merged commit 54523f5 into TanStack:main Apr 23, 2026
8 checks passed
@github-actions github-actions Bot mentioned this pull request Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Fal audio, speech, and music generation support

2 participants