feat: audio media support — fal audio/speech/STT adapters, Gemini Lyria + 3.1 Flash TTS, streaming generateAudio + hooks#463
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new audio generation activity and types, adapter primitives and provider adapters (Fal, Gemini Lyria), streaming (SSE) support, framework hooks/creators (React/Solid/Vue/Svelte), example UI + server routes, docs/changesets, devtools events, tests, and tightened media types to a mutually-exclusive url/b64Json union. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Browser UI
participant Hook as useGenerateAudio
participant Server as /api/generate/audio (SSE)
participant Adapter as Audio Adapter (Fal/Gemini)
participant Provider as External Provider API
Client->>Hook: generate({ prompt, duration, provider })
Hook->>Server: POST /api/generate/audio (body)
Server->>Adapter: buildAudioAdapter(provider)\nadapter.generateAudio({ prompt, duration, stream: true })
Adapter->>Provider: request (model, prompt, duration, modelOptions)
Provider-->>Adapter: stream / response (b64Json or url, contentType)
Adapter-->>Server: AsyncIterable<StreamChunk>
Server-->>Hook: SSE chunks
Hook->>Client: update state (result, isLoading, status)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
Adds falSpeech, falTranscription, and falAudio adapters to @tanstack/ai-fal, completing fal's media coverage alongside image and video. Introduces a new generateAudio activity in @tanstack/ai for music and sound-effect generation, with matching devtools events and types. Closes TanStack#328 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Lyria + 3.1 Flash TTS Extends @tanstack/ai-elevenlabs (which already covers realtime voice) with Speech, Music, Sound Effects, and Transcription adapters, each tree-shakeable under its own import. Adds Gemini Lyria 3 Pro / Clip music generation via a new generateAudio adapter, plus the new Gemini 3.1 Flash TTS Preview model with multi-speaker dialogue support. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a new Audio Generation page, expands the fal adapter reference with sections for text-to-speech, transcription, and audio/music, and adds fal sections to the Text-to-Speech and Transcription guides. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Expand the ts-react-chat example with provider tabs for OpenAI, ElevenLabs, Gemini, and Fal on the TTS and transcription pages, plus a new /generations/audio page covering ElevenLabs Music, ElevenLabs SFX, Gemini Lyria, and Fal audio generation. Add a Gemini TTS unit test and wire an audio-gen feature into the E2E harness (adapter factory, API route, UI, fixture, and Playwright spec). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
94ada28 to
068ca0d
Compare
|
View your CI Pipeline Execution ↗ for commit 662e447
☁️ Nx Cloud last updated this comment at |
Reorder the Audio Generation page so the direct Gemini (Lyria) and ElevenLabs (music/sfx) adapters appear before fal.ai, and update the environment variables + result-shape notes to cover all three providers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tanstack/ai
@tanstack/ai-anthropic
@tanstack/ai-client
@tanstack/ai-code-mode
@tanstack/ai-code-mode-skills
@tanstack/ai-devtools-core
@tanstack/ai-elevenlabs
@tanstack/ai-event-client
@tanstack/ai-fal
@tanstack/ai-gemini
@tanstack/ai-grok
@tanstack/ai-groq
@tanstack/ai-isolate-cloudflare
@tanstack/ai-isolate-node
@tanstack/ai-isolate-quickjs
@tanstack/ai-ollama
@tanstack/ai-openai
@tanstack/ai-openrouter
@tanstack/ai-preact
@tanstack/ai-react
@tanstack/ai-react-ui
@tanstack/ai-solid
@tanstack/ai-solid-ui
@tanstack/ai-svelte
@tanstack/ai-vue
@tanstack/ai-vue-ui
@tanstack/preact-ai-devtools
@tanstack/react-ai-devtools
@tanstack/solid-ai-devtools
commit: |
…el selector Expose an Audio tile on the welcome grid, offer one-click sample prompts for every audio provider, and let the Fal provider pick between current text-to-music models (default MiniMax v2.6). Threads a model override through the audio API and server fn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Snapshots uncommitted fixes to the speech/music/transcription adapters and their tests on the split-out branch (see TanStack#485). Kept separate from TanStack#463 so the fal + Gemini audio work can ship first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves the new ElevenLabs TTS / Music / SFX / Transcription REST adapters out of this PR into their own issue (TanStack#485) and branch (`elevenlabs-audio-adapters`) so the fal + Gemini audio work can ship independently. The follow-up PR will rebuild these adapters on top of the official `@elevenlabs/elevenlabs-js` SDK rather than hand-rolled fetch calls. Removed from this branch: - `packages/typescript/ai-elevenlabs/src/{adapters,utils,model-meta.ts}` and their tests (realtime voice code untouched) - ElevenLabs sections in `docs/media/audio-generation.md` - ElevenLabs entries in `examples/ts-react-chat` audio-providers catalog, server adapter factories, zod schemas, and default provider wiring - `@tanstack/ai-elevenlabs` bump from the audio changeset Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- ai-fal: replace `btoa(String.fromCharCode(...bytes))` with a chunked helper; the spread form throws RangeError on any realistic TTS clip (V8 arg limit ~65k). - ai-gemini: honor `TTSOptions.voice` as a fallback for the prebuilt voice name, move `systemInstruction` inside `config` per the @google/genai contract, and wrap raw `audio/L16;codec=pcm` output in a RIFF/WAV container so the result is actually playable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Log a warning instead of silently swapping to the default when a client sends a model id outside the provider's allowlist, so stale clients or typo'd config ids are debuggable. Also correct the AudioProviderConfig JSDoc to describe the models[] ordering as a non-binding UI convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the unreleased generateAudio activity with two distinct activities so music and sound-effects each have their own types, adapter kinds, provider factories, and devtools events. This lets providers advertise only the capabilities they support (Gemini Lyria is music-only; fal has distinct music and SFX catalogs) and leaves room for kind-specific options without a breaking change. - Core: generateMusic/generateSoundEffects activities and MusicAdapter/ SoundEffectsAdapter interfaces + bases; GeneratedAudio shared between MusicGenerationResult and SoundEffectsGenerationResult - Events: music:request:* and soundEffects:request:* replace audio:* - fal: falMusic + falSoundEffects factories sharing internal request/response helpers; FalMusic/FalSoundEffectsProviderOptions in model-meta - Gemini: geminiMusic/createGeminiMusic/GeminiMusicAdapter (Lyria is music-only so no SFX counterpart) - ts-react-chat: /generations/music and /generations/sound-effects routes backed by a shared AudioGenerationForm; split server fns and API routes - E2E: music-gen + sound-effects-gen features, parameterized MediaAudioGenUI, split fixtures and specs (both feature support sets are empty since aimock 1.14 cannot mock Gemini's Lyria AUDIO modality) - Docs: music-generation.md + sound-effects-generation.md; fal adapter docs split; changesets rewritten in place Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Supersedes 1010e9b. The split into generateMusic + generateSoundEffects doesn't hold up against fal's audio catalog: dozens of models span audio-to-audio, voice-change/clone, enhancement, separation, isolation, merge, and understanding, and individual models (e.g. stable-audio-25) generate music AND sound effects. A single broader generateAudio activity fits that reality. Keeps the aimock Gemini-Lyria gap: audio-gen feature-support stays empty because aimock 1.14 has no AUDIO-modality mock for generateContent — the E2E is green by skipping rather than by hitting a mock that doesn't exist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… GeneratedAudio Model GeneratedImage and GeneratedAudio on a shared mutually-exclusive GeneratedMediaSource union so the type rejects empty objects and objects that set both fields. Update the openai, gemini, grok, openrouter, and fal image adapters to construct results by branching on which field is present; openrouter and fal no longer synthesize a data URI on url when returning base64. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The audio-gen feature set was empty because aimock cannot currently mock audio generation, so the Playwright spec ran against zero providers. Remove the dead scaffolding; the wiring can return once aimock audio support lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the parity gap with the other media activities — audio generation now has the same client-hook UX (connection + fetcher transports) as image, speech, video, transcription, and summarize. Adds streaming to generateAudio so it can ride the SSE transport, a matching AudioGenerateInput type in ai-client, framework hooks in ai-react / ai-solid / ai-vue / ai-svelte, unit tests, an updated ts-react-chat example, and docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fal audio models use different input field names for length: ElevenLabs Music takes `music_length_ms` in milliseconds, Stable Audio 2.5 takes `seconds_total`, and most others accept `duration`. The adapter was passing a generic `duration` unconditionally, so the slider in the example was silently ignored for ElevenLabs and Stable Audio. Also: align the Gemini Lyria adapter with the API's MP3 default (only send responseMimeType when the caller asks for WAV), expand the example to include Lyria 3 Pro and a dedicated Fal SFX provider, and rename the example's "Direct" mode to "Hooks" to better reflect what it demos. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Align the audio model constant and its re-export with the `generateAudio` activity naming used across providers, and drop the unused duplicate `GeminiLyriaModel` type — `GeminiAudioModel` is the single canonical type. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 5
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
packages/typescript/ai-fal/package.json (1)
15-20: 🛠️ Refactor suggestion | 🟠 MajorExpose the new Fal adapters through
/adapterssubpath exports.This package now includes audio/speech/transcription adapters, but the exports map still only exposes the root entrypoint. Add adapter subpath exports so consumers can import tree-shakeable implementations directly.
♻️ Proposed exports update
"exports": { ".": { "types": "./dist/esm/index.d.ts", "import": "./dist/esm/index.js" + }, + "./adapters/*": { + "types": "./dist/esm/adapters/*.d.ts", + "import": "./dist/esm/adapters/*.js" } },As per coding guidelines,
packages/typescript/ai-*/package.json: “Export tree-shakeable adapters from /adapters subpath exports in provider adapter packages”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-fal/package.json` around lines 15 - 20, Update the package "exports" map to expose the adapters subpath so consumers can import specific Fal adapter modules directly; add entries under "exports" such as "./adapters": { "types": "./dist/esm/adapters/index.d.ts", "import": "./dist/esm/adapters/index.js" } and a pattern export "./adapters/*": { "types": "./dist/esm/adapters/*.d.ts", "import": "./dist/esm/adapters/*.js" } (adjust filenames to match the built outputs) so tree-shakeable implementations in the adapters folder are exported alongside the root "." entry.docs/media/generation-hooks.md (1)
20-34:⚠️ Potential issue | 🟡 MinorFinish wiring
useGenerateAudiointo the rest of the guide.Line 29 adds the hook to the overview, but the server-function list, framework variants table, and Next Steps still omit audio generation.
📝 Proposed docs update
-The same pattern applies to all generation types -- swap `generateImage` for `generateSpeech`, `generateTranscription`, `summarize`, or `generateVideo`. See the individual media guides for server-side details. +The same pattern applies to all generation types -- swap `generateImage` for `generateAudio`, `generateSpeech`, `generateTranscription`, `summarize`, or `generateVideo`. See the individual media guides for server-side details.| Generation Type | React (`@tanstack/ai-react`) | Vue (`@tanstack/ai-vue`) | Svelte (`@tanstack/ai-svelte`) | |----------------|------------------------------|--------------------------|-------------------------------| | Image | `useGenerateImage` | `useGenerateImage` | `createGenerateImage` | +| Audio | `useGenerateAudio` | `useGenerateAudio` | `createGenerateAudio` | | Speech | `useGenerateSpeech` | `useGenerateSpeech` | `createGenerateSpeech` |- [Image Generation](./image-generation) -- Provider-specific options, sizes, and model availability +- [Audio Generation](./audio-generation) -- Music, sound effects, model options, and generated audio playback - [Text-to-Speech](./text-to-speech) -- Voice options, audio formats, and streaming audioAlso applies to: 62-62, 425-450
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/media/generation-hooks.md` around lines 20 - 34, The docs mention useGenerateAudio in the overview but omitted audio elsewhere—update the server-function list, the framework variants table, and the "Next Steps" section to include audio generation; specifically add entries referencing useGenerateAudio, AudioGenerateInput, and AudioGenerationResult (or TTSResult if applicable) where other hooks (useGenerateImage, useTranscription, etc.) are listed so audio appears in the server function examples, each framework variant table row, and the actionable next-step links or examples; ensure naming matches the overview table (useGenerateAudio, AudioGenerateInput, AudioGenerationResult) and mirror the formatting used for other media hooks.docs/media/transcription.md (1)
79-108:⚠️ Potential issue | 🟡 MinorUpdate the shared transcription tables for fal.ai.
Line 81 documents URL/File/Blob/ArrayBuffer support, but Line 108 still lists only
File | string, and the environment section still only mentions OpenAI credentials.📝 Proposed docs update
-| `audio` | `File \| string` | Audio data (File object or base64 string) - required | +| `audio` | `string \| File \| Blob \| ArrayBuffer` | Audio data — URL, data URL/base64 string, File, Blob, or ArrayBuffer, depending on provider |-The transcription adapter uses: +Transcription adapters use: - `OPENAI_API_KEY`: Your OpenAI API key +- `FAL_KEY`: Your fal.ai API key, when using `falTranscription()`Also applies to: 515-520
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/media/transcription.md` around lines 79 - 108, Update the fal.ai transcription docs to match the examples: in the "Common Options" table change the `audio` Option type for generateTranscription / falTranscription to include URL, File, Blob, and ArrayBuffer (e.g., `File | string | Blob | ArrayBuffer | URL`) and ensure any mention of "base64 string" reflects accepted types; also update the environment/credentials section to add fal.ai-specific credential instructions (add the fal.ai API key variable and usage note) and apply the same table/credentials fixes to the duplicate block around lines 515-520.examples/ts-react-chat/src/routes/api.transcribe.ts (1)
10-24:⚠️ Potential issue | 🟠 MajorValidate the request body before building the adapter.
Line 11 only casts untrusted JSON; malformed requests or unsupported
providervalues can throw and surface as 500s. Return a 400 for invalid payloads and adapter selection failures.🛡️ Proposed guard
POST: async ({ request }) => { - const body = await request.json() - const { audio, language, provider } = body.data as { - audio: string - language?: string - provider?: TranscriptionProviderId - } - - const adapter = buildTranscriptionAdapter(provider ?? 'openai') + const body = await request.json().catch(() => null) + const data = body?.data + + if (!data || typeof data.audio !== 'string' || data.audio.length === 0) { + return new Response('Missing required audio payload', { status: 400 }) + } + + const { + audio, + language, + provider, + } = data as { + audio: string + language?: string + provider?: TranscriptionProviderId + } + + let adapter + try { + adapter = buildTranscriptionAdapter(provider ?? 'openai') + } catch { + return new Response('Unsupported transcription provider', { + status: 400, + }) + } const stream = generateTranscription({ adapter,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/routes/api.transcribe.ts` around lines 10 - 24, Validate the incoming JSON and its data before calling buildTranscriptionAdapter: check that request.json() yields an object with a data property where data.audio is a non-empty string and optional data.language is a string; validate data.provider against the allowed TranscriptionProviderId values (or use a whitelist) and return a 400 response for missing/invalid payloads. Only after validation, call buildTranscriptionAdapter(provider) inside a try/catch and return a 400 if adapter selection fails (unknown provider) instead of letting it throw a 500, then proceed to call generateTranscription with the validated audio/language and adapter.docs/media/text-to-speech.md (1)
32-63:⚠️ Potential issue | 🟡 MinorFix TTS adapter factory names to match exports.
Line 34 imports
openaiTTSwhich is not exported from@tanstack/ai-openai; the correct export isopenaiSpeech. Line 37 then usesopenaiSpeech()without importing it. Additionally, line 62 usesgeminiTTS()which does not exist; the exported function isgeminiSpeech. Update the imports and function calls to match the actual exported names.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/media/text-to-speech.md` around lines 32 - 63, Update the TTS adapter imports and calls to use the actual exported factory names: replace the incorrect openaiTTS import/use with openaiSpeech (ensure import uses openaiSpeech and the adapter creation calls openaiSpeech()), and replace the geminiTTS call with geminiSpeech (ensure import uses geminiSpeech and adapter creation calls geminiSpeech()). Verify references to generateSpeech remain unchanged and that adapter variables use the corrected factory names openaiSpeech and geminiSpeech.
🟡 Minor comments (22)
.changeset/generated-media-union.md-20-20 (1)
20-20:⚠️ Potential issue | 🟡 MinorFix spacing around inline code in the release note.
Several inline code spans are glued to neighboring words, which will render/read poorly.
📝 Proposed wording cleanup
-Existing read patterns like `img.url || \`data:image/png;base64,${img.b64Json}\``continue to work unchanged. The only runtime-visible change is that the`@tanstack/ai-openrouter`and`@tanstack/ai-fal`image adapters no longer populate`url`with a synthesized`data:image/png;base64,...`URI when the provider returns base64 — they return`{ b64Json }`only. Consumers that want a data URI should build it from`b64Json` at render time. +Existing read patterns like `img.url || \`data:image/png;base64,${img.b64Json}\`` continue to work unchanged. The only runtime-visible change is that the `@tanstack/ai-openrouter` and `@tanstack/ai-fal` image adapters no longer populate `url` with a synthesized `data:image/png;base64,...` URI when the provider returns base64 — they return `{ b64Json }` only. Consumers that want a data URI should build it from `b64Json` at render time.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.changeset/generated-media-union.md at line 20, Fix the spacing around inline code spans in the release note: ensure there is a space before and after inline code tokens so they don't run into surrounding words — e.g., change `img.url || \`data:image/png;base64,${img.b64Json}\``continue to work to `img.url || \`data:image/png;base64,${img.b64Json}\` continue to work`, and add spaces around `@tanstack/ai-openrouter`, `@tanstack/ai-fal`, `url`, `data:image/png;base64,...`, and `{ b64Json }` wherever they are glued to adjacent words so all inline code spans are separated by whitespace for correct rendering..changeset/generated-media-union.md-3-4 (1)
3-4:⚠️ Potential issue | 🟡 MinorBump adapter packages with runtime-visible media shape changes as
minor.The note says
@tanstack/ai-openrouterand@tanstack/ai-falno longer populate synthesizedurlvalues for base64 image results. That is a shape/behavior change for those packages, so these should beminorrather thanpatch.📝 Proposed changeset bump
-'@tanstack/ai-openrouter': patch -'@tanstack/ai-fal': patch +'@tanstack/ai-openrouter': minor +'@tanstack/ai-fal': minorBased on learnings, “In the TanStack/ai repository (pre-1.0), follow the repo’s versioning convention: breaking changes and breaking/shape changes documented in Changesets must use a
minorversion bump (notmajor).”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.changeset/generated-media-union.md around lines 3 - 4, The changeset incorrectly marks '@tanstack/ai-openrouter' and '@tanstack/ai-fal' as patch bumps even though they introduce a runtime-visible media shape/behavior change (synthesized url removal); update the changeset entry in .changeset/generated-media-union.md to change both package bumps from 'patch' to 'minor' so the release reflects the shape change for these adapters.docs/media/text-to-speech.md-69-108 (1)
69-108:⚠️ Potential issue | 🟡 MinorAdd fal.ai to the environment variables and explicit API keys sections.
The fal.ai code examples (lines 69-108) lack setup instructions. The "Environment Variables" section (lines 491-494) only mentions OpenAI and Gemini, and the "Explicit API Keys" section (lines 501-509) has no example for falSpeech. Users need to know they can either set the
FAL_KEYenvironment variable or pass an explicit API key to the adapter.Add
FAL_KEYto the environment variables list, and include a falSpeech example in the explicit API keys section:Environment variables section update
The TTS adapters use the same environment variables as other adapters: - **OpenAI**: `OPENAI_API_KEY` - **Gemini**: `GOOGLE_API_KEY` or `GEMINI_API_KEY` +- **fal.ai**: `FAL_KEY`Also add a falSpeech example to the explicit API keys section alongside the OpenAI and Gemini examples.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/media/text-to-speech.md` around lines 69 - 108, Update docs to document fal.ai API key usage: add FAL_KEY to the "Environment Variables" list and show how to supply an explicit key to the falSpeech adapter. Specifically, in the env vars section add a bullet for FAL_KEY (used by falSpeech), and in the "Explicit API Keys" section add an example that demonstrates passing { apiKey: process.env.FAL_KEY } (or a literal) into falSpeech(...) when calling generateSpeech({ adapter: falSpeech(...), ... }). Reference symbols: FAL_KEY, falSpeech, generateSpeech, adapter, modelOptions.packages/typescript/ai/skills/ai-core/media-generation/SKILL.md-4-9 (1)
4-9:⚠️ Potential issue | 🟡 MinorKeep the all-media lists in sync with
generateAudio.The frontmatter now includes audio, but the overview and streaming-support list still omit it.
📝 Proposed doc sync
-All media activities (image, speech, transcription, video) follow the same +All media activities (image, audio, speech, transcription, video) follow the same server/client architecture: a `generate*()` function on the server, an SSE transport via `toServerSentEventsResponse()`, and a framework hook on the client. @@ -activities (`generateImage`, `generateSpeech`, `generateTranscription`, -`generateVideo`, `summarize`) support `stream: true`, but custom +activities (`generateImage`, `generateAudio`, `generateSpeech`, +`generateTranscription`, `generateVideo`, `summarize`) support `stream: true`, but custom `useGeneration` setups may not.Also applies to: 28-31, 487-494
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai/skills/ai-core/media-generation/SKILL.md` around lines 4 - 9, The documentation frontmatter added audio but the overview and streaming-support lists in SKILL.md were not updated; update the all-media lists and streaming-support sections to include audio and the corresponding adapter/hook names (generateAudio, geminiAudio/falAudio, and useGenerateAudio) wherever image/video/speech/transcription are enumerated (including the sections currently listing generateImage/generateSpeech/generateTranscription and the streaming-support list), so the lists consistently mention audio and its adapters/hooks throughout the document.packages/typescript/ai/skills/ai-core/media-generation/SKILL.md-191-195 (1)
191-195:⚠️ Potential issue | 🟡 MinorBroaden the
generateAudio()description beyond non-speech audio.
falAudioalso covers transformations/enhancement/separation and other audio workflows, so describing it as only music/SFX and “non-speech audio content” is too narrow.📝 Suggested wording
-### 2. Audio Generation (Music, Sound Effects) +### 2. Audio Generation and Transformation -Distinct from TTS — `generateAudio()` produces non-speech audio content. +Distinct from TTS — `generateAudio()` produces or transforms audio content, +including music, sound effects, audio-to-audio, enhancement, separation, and +other provider-specific audio workflows. Supported adapters: `geminiAudio` (Lyria 3 Pro / Lyria 3 Clip) and `falAudio` (MiniMax Music, DiffRhythm, Stable Audio, ElevenLabs SFX, etc.).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai/skills/ai-core/media-generation/SKILL.md` around lines 191 - 195, Update the SKILL.md text for generateAudio() so it no longer limits the function to only “non-speech” music/SFX; expand the description to state generateAudio() supports music, sound effects, transformations, enhancement, separation and other audio workflows; mention both adapters geminiAudio and falAudio and note that falAudio covers generation plus transformation/enhancement/separation/other audio processing workflows rather than only MiniMax Music/DiffRhythm/SFX so readers understand the broader capabilities of generateAudio() and falAudio.packages/typescript/ai/skills/ai-core/media-generation/SKILL.md-211-222 (1)
211-222:⚠️ Potential issue | 🟡 MinorShow a base64 fallback in the audio hook example.
The docs say providers may return
urlorb64Json, but the playback example only works for URL results.📝 Suggested playback example
-// Trigger: generate({ prompt: 'Upbeat synths', duration: 10 }) -// Play: <audio src={result.audio.url} controls /> +const audioSrc = + result?.audio.url ?? + (result?.audio.b64Json + ? `data:${result.audio.contentType ?? 'audio/mpeg'};base64,${result.audio.b64Json}` + : undefined) + +// Trigger: generate({ prompt: 'Upbeat synths', duration: 10 }) +// Play: {audioSrc && <audio src={audioSrc} controls />}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai/skills/ai-core/media-generation/SKILL.md` around lines 211 - 222, Update the client example for useGenerateAudio/fetchServerSentEvents to handle provider responses that return base64 instead of a URL by adding a fallback that checks result.audio.b64Json and converts it to a data URL (e.g., "data:audio/<format>;base64,<b64>") for the audio src when result.audio.url is absent; reference the useGenerateAudio hook, fetchServerSentEvents connection, and the result.audio properties (url and b64Json) so readers know where to implement the fallback.examples/ts-react-chat/src/routes/api.generate.speech.ts-10-18 (1)
10-18:⚠️ Potential issue | 🟡 MinorValidate
body.databefore destructuring it.A malformed or direct request without
{ data: { text } }will throw during destructuring and return a 500. Return a 400 before building the adapter.🛡️ Proposed guard
const body = await request.json() - const { text, voice, format, provider } = body.data as { + const data = body?.data + + if (!data || typeof data.text !== 'string' || !data.text.trim()) { + return new Response(JSON.stringify({ error: 'text is required' }), { + status: 400, + headers: { 'content-type': 'application/json' }, + }) + } + + const { text, voice, format, provider } = data as { text: string voice?: string format?: 'mp3' | 'opus' | 'aac' | 'flac' | 'wav' | 'pcm' provider?: SpeechProviderId }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/routes/api.generate.speech.ts` around lines 10 - 18, The handler currently destructures body.data without checking it, causing a crash on malformed requests; before destructuring and before calling buildSpeechAdapter, validate that body and body.data exist and that body.data.text is a non-empty string, and if not respond with a 400 error (bad request) and a clear message; update the code around the request.json() usage and the destructuring of { text, voice, format, provider } to perform this guard and only call buildSpeechAdapter(provider ?? 'openai') when validation passes.examples/ts-react-chat/src/routes/api.generate.audio.ts-10-18 (1)
10-18:⚠️ Potential issue | 🟡 MinorGuard malformed audio-generation requests.
body.datais assumed to exist, anddurationis assumed to be a finite number. Invalid requests should fail with 400 instead of throwing or reaching the provider.🛡️ Proposed guard
const body = await request.json() - const { prompt, duration, provider, model } = body.data as { + const data = body?.data + + if ( + !data || + typeof data.prompt !== 'string' || + !data.prompt.trim() || + (data.duration !== undefined && + (typeof data.duration !== 'number' || !Number.isFinite(data.duration))) + ) { + return new Response(JSON.stringify({ error: 'valid prompt is required' }), { + status: 400, + headers: { 'content-type': 'application/json' }, + }) + } + + const { prompt, duration, provider, model } = data as { prompt: string duration?: number provider?: AudioProviderId model?: string }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/routes/api.generate.audio.ts` around lines 10 - 18, Guard the incoming request before calling buildAudioAdapter: validate that body and body.data exist and that data.prompt is a non-empty string; if data.duration is provided ensure Number.isFinite(data.duration) and it is within acceptable bounds; validate optional data.provider and data.model types; if any check fails return a 400 response immediately (do not call buildAudioAdapter). Perform these checks around where request.json() is read and before the line that calls buildAudioAdapter(provider ?? 'gemini-lyria', model), and include clear error messages in the 400 responses..changeset/audio-example-pages.md-4-15 (1)
4-15:⚠️ Potential issue | 🟡 MinorRemove the E2E coverage claim from the changeset.
The PR notes say audio-gen E2E scaffolding was removed and E2E coverage is still pending, but this changeset says E2E coverage/specs were added.
📝 Proposed wording fix
-chore: add ts-react-chat example pages and E2E coverage for audio providers +chore: add ts-react-chat example pages for audio providers @@ -- Added a new `audio-gen` feature to the E2E harness with a Gemini Lyria adapter factory, route, UI, fixture, and spec +- E2E audio generation coverage remains pending aimock AUDIO-modality support🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.changeset/audio-example-pages.md around lines 4 - 15, Update the changeset to remove the incorrect claim of added E2E coverage: edit the .changeset/audio-example-pages.md entry to delete or reword the bullets that state E2E tests/specs were added (references to "audio-gen", "E2E harness", "route, UI, fixture, and spec", and "E2E coverage") and instead note that E2E scaffolding was removed and E2E coverage is pending; ensure the remaining bullets about example pages and unit tests (Gemini TTS adapter items) remain accurate.docs/media/audio-generation.md-30-36 (1)
30-36:⚠️ Potential issue | 🟡 MinorFix the Gemini example to use
b64Json.The Gemini example logs
result.audio.url, but this page later says Gemini returns base64 inresult.audio.b64Json. That contradiction will send users down the wrong path.Proposed fix
const result = await generateAudio({ adapter: geminiAudio('lyria-3-pro-preview'), prompt: 'Uplifting indie pop with layered vocals and jangly guitars', }) -console.log(result.audio.url) // URL to the generated audio file +console.log(result.audio.b64Json) // Base64-encoded generated audio bytes console.log(result.audio.contentType) // e.g. "audio/mpeg"Also applies to: 121-126
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/media/audio-generation.md` around lines 30 - 36, The Gemini audio example is incorrect: Gemini returns base64 in result.audio.b64Json, not a URL. Update the example that calls generateAudio with adapter: geminiAudio('lyria-3-pro-preview') to read and log result.audio.b64Json (and/or show how to decode/save it) instead of result.audio.url and result.audio.contentType; also make the same replacement for the later duplicate example that currently logs result.audio.url. Target the generateAudio call and any uses of result.audio.url in this file and replace them with result.audio.b64Json handling.docs/media/audio-generation.md-107-118 (1)
107-118:⚠️ Potential issue | 🟡 MinorDocument that
urlandb64Jsonare mutually exclusive.The PR changes
GeneratedAudioto require exactly one source, but this shape shows both as independently optional. Use a union so the docs match the API contract.Proposed fix
interface AudioGenerationResult { id: string model: string - audio: { - url?: string - b64Json?: string - contentType?: string - duration?: number - } + audio: + | { + url: string + b64Json?: never + contentType?: string + duration?: number + } + | { + url?: never + b64Json: string + contentType?: string + duration?: number + } usage?: { inputTokens?: number; outputTokens?: number; totalTokens?: number } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/media/audio-generation.md` around lines 107 - 118, The AudioGenerationResult interface currently allows both audio.url and audio.b64Json to be optional independently; update the audio property to be a discriminated union so exactly one source is required to match the GeneratedAudio API contract — for example replace the current audio object with a union of two shapes (one with url: string and optional contentType/duration, and the other with b64Json: string and optional contentType/duration) so callers cannot provide both or neither; ensure the interface name AudioGenerationResult and its audio property are the ones changed to reflect this mutual exclusivity.packages/typescript/ai-gemini/tests/audio-adapter.test.ts-35-42 (1)
35-42:⚠️ Potential issue | 🟡 MinorRestore the previous
GOOGLE_API_KEYvalue.This test deletes the env var unconditionally, which can clobber a pre-existing value for later tests in the same worker.
Proposed fix
it('geminiAudio reads the API key from the environment', () => { + const previousGoogleApiKey = process.env.GOOGLE_API_KEY process.env.GOOGLE_API_KEY = 'env-key' try { const adapter = geminiAudio('lyria-3-clip-preview') expect(adapter.model).toBe('lyria-3-clip-preview') } finally { - delete process.env.GOOGLE_API_KEY + if (previousGoogleApiKey === undefined) { + delete process.env.GOOGLE_API_KEY + } else { + process.env.GOOGLE_API_KEY = previousGoogleApiKey + } } })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/tests/audio-adapter.test.ts` around lines 35 - 42, The test for geminiAudio that manipulates process.env.GOOGLE_API_KEY should save the existing value before setting it and restore it in the finally block instead of unconditionally deleting it; modify the test around geminiAudio('lyria-3-clip-preview') to capture const prev = process.env.GOOGLE_API_KEY, set process.env.GOOGLE_API_KEY = 'env-key', and in finally restore process.env.GOOGLE_API_KEY = prev (or delete only if prev is undefined) so other tests are not affected.examples/ts-react-chat/src/routes/generations.transcription.tsx-1-11 (1)
1-11:⚠️ Potential issue | 🟡 MinorFix the import-order/type-import lint failures.
Static analysis reports
import/orderandimport/consistent-type-specifier-styleerrors here; split the inline type specifiers into top-level type-only imports in the configured order.Proposed fix
import { useMemo, useRef, useState } from 'react' import { createFileRoute } from '@tanstack/react-router' import { useTranscription } from '@tanstack/ai-react' -import type { UseTranscriptionReturn } from '@tanstack/ai-react' import { fetchServerSentEvents } from '@tanstack/ai-client' import { transcribeFn, transcribeStreamFn } from '../lib/server-fns' -import { - TRANSCRIPTION_PROVIDERS, - type TranscriptionProviderConfig, - type TranscriptionProviderId, -} from '../lib/audio-providers' +import { TRANSCRIPTION_PROVIDERS } from '../lib/audio-providers' +import type { UseTranscriptionReturn } from '@tanstack/ai-react' +import type { + TranscriptionProviderConfig, + TranscriptionProviderId, +} from '../lib/audio-providers'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/routes/generations.transcription.tsx` around lines 1 - 11, Split the inline type specifiers into top-level type-only imports and reorder imports to satisfy import/order: change "import { useTranscription } from '@tanstack/ai-react' import type { UseTranscriptionReturn } from '@tanstack/ai-react'" (and any other inline "type" usages such as TranscriptionProviderConfig and TranscriptionProviderId) so that all runtime imports come first and all "import type" statements appear after them as separate lines; ensure you replace inline type references in the existing import lists with the runtime-only imports (e.g., keep useTranscription, fetchServerSentEvents, transcribeFn, transcribeStreamFn, TRANSCRIPTION_PROVIDERS as runtime imports) and add top-level "import type { UseTranscriptionReturn } from '@tanstack/ai-react'" and "import type { TranscriptionProviderConfig, TranscriptionProviderId } from '../lib/audio-providers'" to satisfy import/consistent-type-specifier-style and import/order.examples/ts-react-chat/src/routes/generations.speech.tsx-1-11 (1)
1-11:⚠️ Potential issue | 🟡 MinorFix the import-order/type-import lint failures.
Static analysis reports
import/orderandimport/consistent-type-specifier-styleerrors here; move the@tanstack/ai-reacttype import into a top-level type-only import in the configured order.Proposed fix
import { useMemo, useState } from 'react' import { createFileRoute } from '@tanstack/react-router' import { useGenerateSpeech } from '@tanstack/ai-react' -import type { UseGenerateSpeechReturn } from '@tanstack/ai-react' import { fetchServerSentEvents } from '@tanstack/ai-client' import { generateSpeechFn, generateSpeechStreamFn } from '../lib/server-fns' -import { - SPEECH_PROVIDERS, - type SpeechProviderConfig, - type SpeechProviderId, -} from '../lib/audio-providers' +import { SPEECH_PROVIDERS } from '../lib/audio-providers' +import type { UseGenerateSpeechReturn } from '@tanstack/ai-react' +import type { + SpeechProviderConfig, + SpeechProviderId, +} from '../lib/audio-providers'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/routes/generations.speech.tsx` around lines 1 - 11, The imports violate import/order and type-only style: move the type import for UseGenerateSpeechReturn into a top-level type-only import and reorder imports to match configured order (third-party packages first, then local files); specifically, change the import of UseGenerateSpeechReturn to a separate "import type { UseGenerateSpeechReturn } from '@tanstack/ai-react'" placed adjacent to the other `@tanstack` imports and keep useGenerateSpeech as a value import from '@tanstack/ai-react' so functions like useGenerateSpeech and fetchServerSentEvents remain value imports while UseGenerateSpeechReturn is a type-only import.examples/ts-react-chat/src/routes/generations.audio.tsx-1-12 (1)
1-12:⚠️ Potential issue | 🟡 MinorFix the lint-blocking import order and type-only import style.
The static analysis errors here will fail lint. Split inline type specifiers into top-level type imports and place type imports after value imports.
🧹 Proposed lint fix
import { useMemo, useState } from 'react' -import { createFileRoute } from '@tanstack/react-router' -import { useGenerateAudio } from '@tanstack/ai-react' -import type { UseGenerateAudioReturn } from '@tanstack/ai-react' import { fetchServerSentEvents } from '@tanstack/ai-client' -import type { AudioGenerationResult } from '@tanstack/ai' +import { useGenerateAudio } from '@tanstack/ai-react' +import { createFileRoute } from '@tanstack/react-router' import { generateAudioFn } from '../lib/server-fns' -import { - AUDIO_PROVIDERS, - type AudioProviderConfig, - type AudioProviderId, -} from '../lib/audio-providers' +import { AUDIO_PROVIDERS } from '../lib/audio-providers' +import type { AudioGenerationResult } from '@tanstack/ai' +import type { UseGenerateAudioReturn } from '@tanstack/ai-react' +import type { + AudioProviderConfig, + AudioProviderId, +} from '../lib/audio-providers'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/routes/generations.audio.tsx` around lines 1 - 12, Imports mix value and type specifiers and are misordered, causing lint failures; fix by separating type-only imports into top-level "import type" statements placed after the value imports and reorder imports to follow project convention (external packages first, then local modules). Specifically, keep value imports like useMemo, useState, createFileRoute, useGenerateAudio, fetchServerSentEvents, generateAudioFn, and AUDIO_PROVIDERS as regular imports, then add top-level "import type" lines for UseGenerateAudioReturn, AudioGenerationResult, AudioProviderConfig, and AudioProviderId (referencing those exact symbols) placed after the value imports so the linter accepts the style and ordering.packages/typescript/ai-fal/tests/speech-adapter.test.ts-20-26 (1)
20-26:⚠️ Potential issue | 🟡 MinorMove the
fetchstub intobeforeEachand restore it inafterEach.The global stub on line 26 persists across tests without explicit cleanup, risking test isolation issues. Use
beforeEachto stub andafterEachwithvi.unstubAllGlobals()to ensure cleanup between tests, matching the pattern used elsewhere in the codebase.Proposed fix
-import { beforeEach, describe, expect, it, vi } from 'vitest' +import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest' const mockFetchResponse = { arrayBuffer: () => Promise.resolve(new Uint8Array([72, 101, 108, 108, 111]).buffer), } -const mockFetch = vi.fn().mockResolvedValue(mockFetchResponse) -vi.stubGlobal('fetch', mockFetch) beforeEach(() => { vi.clearAllMocks() + const mockFetch = vi.fn().mockResolvedValue(mockFetchResponse) + vi.stubGlobal('fetch', mockFetch) }) + +afterEach(() => { + vi.unstubAllGlobals() +})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-fal/tests/speech-adapter.test.ts` around lines 20 - 26, The global fetch stub defined as mockFetch and installed with vi.stubGlobal('fetch', mockFetch) should be moved into a beforeEach block and cleaned up in an afterEach block; update tests in speech-adapter.test.ts to create the mockFetch and call vi.stubGlobal('fetch', mockFetch) inside beforeEach and call vi.unstubAllGlobals() (or restore the stub) in afterEach to ensure isolation between tests and match the project's test patterns.examples/ts-react-chat/src/routes/generations.audio.tsx-33-44 (1)
33-44:⚠️ Potential issue | 🟡 MinorRevoke Blob URLs created for base64 audio results.
URL.createObjectURL(blob)retains the audio buffer in memory until revoked. Repeated generations leak large audio blobs since previous blob URLs are never cleaned up when the result changes, is reset, or the component unmounts.Add a cleanup effect to revoke object URLs:
Proposed solution
-import { useMemo, useState } from 'react' +import { useEffect, useMemo, useState } from 'react' @@ function AudioGenerationUI({ @@ }) { + useEffect(() => { + if (!result?.url.startsWith('blob:')) return + + return () => { + URL.revokeObjectURL(result.url) + } + }, [result?.url]) + const handleGenerate = () => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/routes/generations.audio.tsx` around lines 33 - 44, The code creates blob URLs via URL.createObjectURL(blob) for base64 audio (when audio.b64Json) but never revokes them, leaking memory; update the component that uses this generation to track the produced url (the value returned alongside contentType) and revoke previous blob URLs by calling URL.revokeObjectURL(oldUrl) before assigning a new one and again on component unmount (use a React useEffect cleanup) so all Blob URLs produced by URL.createObjectURL are properly released; ensure any reset path also revokes the current url.examples/ts-react-chat/src/lib/server-audio-adapters.ts-75-86 (1)
75-86:⚠️ Potential issue | 🟡 MinorMinor: override silently rejected when
modelslist is absent.When
config.modelsisundefined,config.models?.some(...)evaluates toundefined(falsy), so every override gets the "rejected" warning and falls back. If providers without an explicitmodelsarray are intended to disallow overrides at all, consider making that explicit in the warning (or only warning when amodelslist exists and the override isn't in it) to avoid noisy logs for a legitimate config shape.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/lib/server-audio-adapters.ts` around lines 75 - 86, The resolveModel function currently warns on every override when config.models is undefined; change its logic so that if config.models is undefined it simply returns config.model (no warning), and only perform the .some(...) check and emit the console.warn when config.models exists; update resolveModel to check for config.models presence first, then validate modelOverride against config.models and warn only when the list exists and the override isn't found.packages/typescript/ai-fal/src/adapters/transcription.ts-63-78 (1)
63-78:⚠️ Potential issue | 🟡 Minor
options.audioas base64 string won't work here.
TranscriptionOptions.audiois typed asstring | File | Blob | ArrayBuffer, where a barestringis commonly a base64 payload (matching other providers like OpenAI). Here it is passed straight through asaudio_url, so fal will try to fetch it as a URL. If users pass base64, this will fail with an opaque fal error.Consider either wrapping a detected base64/data-URI string into a Blob before assignment, or at minimum documenting on
falTranscriptionthataudiomust be a URL string (or Blob/File/ArrayBuffer).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-fal/src/adapters/transcription.ts` around lines 63 - 78, buildInput currently passes a bare string from TranscriptionOptions.audio straight to FalModelInput.audio_url which treats it as a URL and breaks for base64/data-URI payloads; update buildInput to detect when options.audio is a string that is a data URI or base64 payload and convert it into a Blob (decode base64, create Uint8Array, then new Blob([...], { type: inferredMime })) before assigning audio_url, keeping the existing handling for File/Blob/ArrayBuffer; alternatively, if you prefer not to convert, update the falTranscription docs/type to clearly state that string must be a URL and validate/throw a helpful error in buildInput when a non-URL string is provided.packages/typescript/ai-gemini/src/adapters/tts.ts-203-213 (1)
203-213:⚠️ Potential issue | 🟡 Minor
parsePcmMimeTypeextractschannelsbut it's never forwarded to the WAV wrapper.
wrapPcmBase64AsWavfalls back tochannels = 1, so a stereoaudio/L16;rate=...;channels=2response would produce a WAV header claiming mono — the file would play at double speed / corrupted. Gemini TTS is mono today, but since the parser already supports a channels hint, forward it to avoid a latent bug.🐛 Proposed fix
const pcm = parsePcmMimeType(mimeType) if (pcm) { - const wavBase64 = wrapPcmBase64AsWav(audioBase64, pcm.sampleRate) + const wavBase64 = wrapPcmBase64AsWav( + audioBase64, + pcm.sampleRate, + pcm.channels, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/adapters/tts.ts` around lines 203 - 213, The pcm channels parsed by parsePcmMimeType are not passed into wrapPcmBase64AsWav, so WAV headers default to mono and stereo L16 responses will be corrupted; modify the branch that handles pcm (in the function producing the returned object) to pass pcm.channels (or a validated default) into wrapPcmBase64AsWav instead of relying on its internal fallback, e.g., call wrapPcmBase64AsWav(audioBase64, pcm.sampleRate, pcm.channels) and ensure wrapPcmBase64AsWav accepts and uses that channels parameter.packages/typescript/ai/src/activities/generateAudio/index.ts-133-154 (1)
133-154:⚠️ Potential issue | 🟡 Minor
durationmeans two different things acrossstartedvscompletedevents.On
audio:request:started,durationis the requested audio length (seconds, fromrest.duration). Onaudio:request:completed,durationis the elapsed request time in milliseconds (elapsedMs). Reusing the same key for audio-length vs wall-clock elapsed time is a footgun for devtools consumers (and breaks symmetry with the request-scoped fields).Consider renaming the completed-event field — e.g.,
elapsedMsorlatencyMs— to disambiguate. This is a new event surface, so it's the cheapest time to fix.♻️ Suggested rename
aiEventClient.emit('audio:request:completed', { requestId, provider: adapter.name, model, audio: result.audio, - duration: elapsedMs, + elapsedMs, modelOptions: rest.modelOptions as Record<string, unknown> | undefined, timestamp: Date.now(), })(Update the matching event type in
@tanstack/ai-event-clientaccordingly.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai/src/activities/generateAudio/index.ts` around lines 133 - 154, The completed event reuses the key duration for elapsed wall-clock time which conflicts with the requested audio length sent in the started event; change the completed payload to use a distinct field name (e.g., elapsedMs or latencyMs) instead of duration in the aiEventClient.emit call after adapter.generateAudio (reference: requestId, adapter.name, model, result.audio, elapsedMs, rest.duration), and update the corresponding event type in `@tanstack/ai-event-client` so consumers expect the new field name.packages/typescript/ai-gemini/src/adapters/audio.ts-96-115 (1)
96-115:⚠️ Potential issue | 🟡 MinorUse the standard
audio/mpegMIME and note the dead fallback.Two small issues in the audio extraction:
'audio/mp3'is not a registered IANA media type — the standard is'audio/mpeg'. Some stricter clients (andMediaSource.isTypeSupported) only accept the standard form.- The
?? 'audio/mp3'fallback is unreachable:audioPartis the element found viapart.inlineData?.mimeType?.startsWith('audio/'), somimeTypeis guaranteed to be a string here.♻️ Proposed fix
- const contentType = audioPart.inlineData.mimeType ?? 'audio/mp3' + const contentType = audioPart.inlineData.mimeType ?? 'audio/mpeg'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/adapters/audio.ts` around lines 96 - 115, The MIME fallback and selection should use the standard "audio/mpeg" and avoid the unreachable "?? 'audio/mp3'": in the audio extraction code (look for audioPart and its inlineData.mimeType and contentType), set contentType to the actual mime string from audioPart.inlineData.mimeType and if you want a defensive default use 'audio/mpeg' (e.g. contentType = audioPart.inlineData.mimeType || 'audio/mpeg'), and replace the old 'audio/mp3' token and remove the unreachable nullish-coalescing fallback.
🧹 Nitpick comments (6)
packages/typescript/ai-fal/src/utils/client.ts (1)
73-78: Avoid byte-by-byte string concatenation for large audio payloads.The browser fallback can become allocation-heavy for multi-MB audio buffers. Build chunks and join once before
btoa.♻️ Proposed browser fallback
const view = new Uint8Array(bytes) - let binary = '' - for (let i = 0; i < view.byteLength; i += 1) { - binary += String.fromCharCode(view[i]!) + const chunkSize = 0x8000 + const chunks: Array<string> = [] + for (let i = 0; i < view.byteLength; i += chunkSize) { + let chunk = '' + const end = Math.min(i + chunkSize, view.byteLength) + for (let j = i; j < end; j += 1) { + chunk += String.fromCharCode(view[j]!) + } + chunks.push(chunk) } - return btoa(binary) + return btoa(chunks.join('')) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-fal/src/utils/client.ts` around lines 73 - 78, The current browser fallback converts a Uint8Array to base64 via byte-by-byte string concatenation (variables view, binary and final btoa call), which is allocation-heavy for large audio; replace the loop with a chunked approach: iterate the view in slices (e.g., 16–32KB), push String.fromCharCode applied to each slice into an array, join the array once to form the binary string, then call btoa on the joined string — update the code around view/binary/btoa to use this chunked buffer-join pattern to reduce allocations.packages/typescript/ai-vue/tests/use-generation.test.ts (1)
440-492: Add error and reset parity coverage foruseGenerateAudio.The new suite covers initial/fetcher/connection paths, but unlike the adjacent media hooks it does not exercise errors or reset/stop exposure.
🧪 Suggested additional cases
it('should generate audio using connection', async () => { const chunks = createGenerationChunks(mockAudioResult) const adapter = createMockConnectionAdapter({ chunks }) @@ expect(result.result.value).toEqual(mockAudioResult) expect(result.status.value).toBe('success') }) + + it('should handle errors', async () => { + const onError = vi.fn() + + const { result } = renderHook(() => + useGenerateAudio({ + fetcher: async () => { + throw new Error('Audio generation failed') + }, + onError, + }), + ) + + await result.generate({ prompt: 'test' }) + await flushPromises() + await nextTick() + + expect(result.status.value).toBe('error') + expect(result.error.value?.message).toBe('Audio generation failed') + expect(onError).toHaveBeenCalled() + }) + + it('should expose stop and reset', () => { + const adapter = createMockConnectionAdapter() + const { result } = renderHook(() => + useGenerateAudio({ connection: adapter }), + ) + + expect(typeof result.stop).toBe('function') + expect(typeof result.reset).toBe('function') + }) })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-vue/tests/use-generation.test.ts` around lines 440 - 492, Add tests to cover error and reset/stop parity for useGenerateAudio: add a test that supplies a fetcher which throws (or rejects) and assert result.value becomes null and status.value becomes 'error' after awaiting generate; add a test that simulates a connection that emits an error chunk (use createMockConnectionAdapter with chunks containing an error event) and assert the same error state; add a test that calls result.reset() after a successful generation and assert result.value is cleared and status.value returns to 'idle'; and add a test that calls result.stop() during a streaming generation and assert generation is aborted and status.value becomes 'idle' or 'stopped' as the hook defines. Reference useGenerateAudio, generate, reset, stop, createMockConnectionAdapter and createGenerationChunks to locate where to add these cases.packages/typescript/ai-gemini/src/model-meta.ts (1)
511-549: Preserve model-specific provider option typing for Lyria models.Both Lyria entries use bare
ModelMeta, so their provider option type falls back tounknowneven though the audio adapter accepts typedmodelOptionslikeresponseMimeTypeandseed. Please wire these models into the Gemini audio provider-options/type map rather than leaving them untyped. Based on learnings, “Provide type safety per model by using model-specific provider options in adapter packages.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/model-meta.ts` around lines 511 - 549, LYRIA_3_PRO and LYRIA_3_CLIP are declared as plain ModelMeta which causes their provider option type to be unknown; update these constants to use the same model-specific provider-options typing used for other Gemini audio models and register them in the Gemini audio provider-options/type map so the audio adapter recognizes typed options like responseMimeType and seed. Locate LYRIA_3_PRO and LYRIA_3_CLIP in model-meta.ts and replace their bare ModelMeta assertion with the concrete typed meta (the same pattern used by existing Gemini audio models), then add entries mapping 'lyria-3-pro-preview' and 'lyria-3-clip-preview' to the Gemini audio provider-options/type map so callers of the audio adapter receive the correct typed modelOptions.docs/adapters/fal.md (1)
244-244: Consider pinning the version caveat less tightly.Calling out
@fal-ai/client@1.9.1by exact version will go stale quickly as fal's SDK adds types for newer models. Consider phrasing as "current@fal-ai/clienttype map" so the doc doesn't need to be edited on every SDK bump.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/adapters/fal.md` at line 244, Update the note that currently references the exact package version string '@fal-ai/client@1.9.1' to a looser, time-proof phrasing (e.g., "the current `@fal-ai/client` type map" or "the installed `@fal-ai/client` type map") so it won't go stale; keep the rest of the sentence about modelOptions not autocompleting and that the fal adapter accepts any model ID as a string (references: the literal '@fal-ai/client@1.9.1', the term 'modelOptions', and 'fal adapter').packages/typescript/ai/src/activities/generateAudio/index.ts (1)
79-81: Minor:createIdis duplicated across media activities.The same helper exists in
generateTranscription,generateSpeech, and nowgenerateAudio. Consider lifting it into a shared util (e.g. alongsidestreamGenerationResult) to avoid drift if the format ever changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai/src/activities/generateAudio/index.ts` around lines 79 - 81, The createId helper in generateAudio (function createId) is duplicated across media activities (generateTranscription, generateSpeech, generateAudio); extract it into a shared util (e.g., next to streamGenerationResult) and replace the local createId definitions with an import. Move the implementation `${prefix}-${Date.now()}-${Math.random().toString(36).slice(2,9)}` into the new util, export a named createId function, update imports in the affected files to use the shared createId, and remove the duplicate functions to keep a single canonical generator.packages/typescript/ai-fal/src/adapters/audio.ts (1)
99-113:audio_urlbranch dropscontentType— consider inferring it.When the provider returns
{ audio_url: string },contentTypestaysundefinedeven though the URL extension or aContent-Typecould be used. Not a blocker, but downstream consumers that key offcontentType(players, downloads) will degrade on these models.♻️ Optional: infer contentType from URL extension
} else if (typeof data.audio_url === 'string') { audioUrl = data.audio_url + const ext = audioUrl.split('?')[0]?.split('.').pop()?.toLowerCase() + if (ext === 'mp3') contentType = 'audio/mpeg' + else if (ext === 'wav') contentType = 'audio/wav' + else if (ext === 'ogg') contentType = 'audio/ogg' + else if (ext === 'flac') contentType = 'audio/flac' }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-fal/src/adapters/audio.ts` around lines 99 - 113, When handling the `{ audio_url: string }` branch in adapters/audio.ts (the code that sets audioUrl and contentType), add logic to infer and set contentType if it remains undefined by mapping the audioUrl file extension to a MIME type (e.g., .mp3 -> audio/mpeg, .wav -> audio/wav, .ogg -> audio/ogg, .m4a/.mp4 -> audio/mp4) after assigning audioUrl; update the code that currently assigns audioUrl in the else-if branch to also attempt this extension-to-MIME mapping and set the contentType variable so downstream consumers using contentType (the variables audioUrl and contentType) will have a value when only audio_url is returned.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b758fb0a-8a6b-44c5-a066-5bb44b32389a
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (72)
.changeset/audio-activity.md.changeset/audio-example-pages.md.changeset/audio-generation-hook.md.changeset/fal-audio-speech-transcription.md.changeset/gemini-audio.md.changeset/generated-media-union.mddocs/adapters/fal.mddocs/config.jsondocs/media/audio-generation.mddocs/media/generation-hooks.mddocs/media/text-to-speech.mddocs/media/transcription.mdexamples/ts-react-chat/package.jsonexamples/ts-react-chat/src/components/Header.tsxexamples/ts-react-chat/src/lib/audio-providers.tsexamples/ts-react-chat/src/lib/server-audio-adapters.tsexamples/ts-react-chat/src/lib/server-fns.tsexamples/ts-react-chat/src/routeTree.gen.tsexamples/ts-react-chat/src/routes/api.generate.audio.tsexamples/ts-react-chat/src/routes/api.generate.speech.tsexamples/ts-react-chat/src/routes/api.transcribe.tsexamples/ts-react-chat/src/routes/generations.audio.tsxexamples/ts-react-chat/src/routes/generations.speech.tsxexamples/ts-react-chat/src/routes/generations.transcription.tsxexamples/ts-react-chat/src/routes/index.tsxpackages/typescript/ai-client/src/generation-types.tspackages/typescript/ai-client/src/index.tspackages/typescript/ai-event-client/src/index.tspackages/typescript/ai-fal/package.jsonpackages/typescript/ai-fal/src/adapters/audio.tspackages/typescript/ai-fal/src/adapters/image.tspackages/typescript/ai-fal/src/adapters/speech.tspackages/typescript/ai-fal/src/adapters/transcription.tspackages/typescript/ai-fal/src/index.tspackages/typescript/ai-fal/src/model-meta.tspackages/typescript/ai-fal/src/utils/client.tspackages/typescript/ai-fal/src/utils/index.tspackages/typescript/ai-fal/tests/audio-adapter.test.tspackages/typescript/ai-fal/tests/image-adapter.test.tspackages/typescript/ai-fal/tests/speech-adapter.test.tspackages/typescript/ai-fal/tests/transcription-adapter.test.tspackages/typescript/ai-gemini/src/adapters/audio.tspackages/typescript/ai-gemini/src/adapters/image.tspackages/typescript/ai-gemini/src/adapters/tts.tspackages/typescript/ai-gemini/src/index.tspackages/typescript/ai-gemini/src/model-meta.tspackages/typescript/ai-gemini/tests/audio-adapter.test.tspackages/typescript/ai-gemini/tests/tts-adapter.test.tspackages/typescript/ai-grok/src/adapters/image.tspackages/typescript/ai-openai/src/adapters/image.tspackages/typescript/ai-openrouter/src/adapters/image.tspackages/typescript/ai-openrouter/tests/image-adapter.test.tspackages/typescript/ai-react/src/index.tspackages/typescript/ai-react/src/use-generate-audio.tspackages/typescript/ai-react/tests/use-generation.test.tspackages/typescript/ai-solid/src/index.tspackages/typescript/ai-solid/src/use-generate-audio.tspackages/typescript/ai-solid/tests/use-generation.test.tspackages/typescript/ai-svelte/src/create-generate-audio.svelte.tspackages/typescript/ai-svelte/src/index.tspackages/typescript/ai-vue/src/index.tspackages/typescript/ai-vue/src/use-generate-audio.tspackages/typescript/ai-vue/tests/use-generation.test.tspackages/typescript/ai/skills/ai-core/media-generation/SKILL.mdpackages/typescript/ai/src/activities/generateAudio/adapter.tspackages/typescript/ai/src/activities/generateAudio/index.tspackages/typescript/ai/src/activities/generateSpeech/index.tspackages/typescript/ai/src/activities/generateTranscription/index.tspackages/typescript/ai/src/activities/index.tspackages/typescript/ai/src/index.tspackages/typescript/ai/src/types.tsterminalOutput
💤 Files with no reviewable changes (1)
- terminalOutput
| return { | ||
| fetcher: (input: { text: string; voice?: string }) => | ||
| generateSpeechStreamFn({ | ||
| data: { ...input, provider: config.id }, | ||
| }), | ||
| onResult: toSpeechOutput, | ||
| } |
There was a problem hiding this comment.
Use SSE connection mode for the streaming server function.
generateSpeechStreamFn returns an SSE Response, but this branch passes it as a fetcher result. That makes onResult: toSpeechOutput receive the wrong shape instead of parsed stream chunks, breaking the “Server Fn” mode.
Proposed fix
return {
- fetcher: (input: { text: string; voice?: string }) =>
- generateSpeechStreamFn({
- data: { ...input, provider: config.id },
- }),
+ connection: fetchServerSentEvents(
+ (input: { text: string; voice?: string }) =>
+ generateSpeechStreamFn({
+ data: { ...input, provider: config.id },
+ }),
+ ),
onResult: toSpeechOutput,
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/ts-react-chat/src/routes/generations.speech.tsx` around lines 65 -
71, The fetcher currently returns the SSE Response directly which causes
onResult (toSpeechOutput) to receive the wrong shape; change the fetcher to
return an object with the SSE response and the SSE connection mode so the client
treats it as a streaming Server Fn. Concretely, update the fetcher to return {
response: generateSpeechStreamFn({ data: { ...input, provider: config.id } }),
connection: 'sse' } so generateSpeechStreamFn (the SSE Response) is passed with
connection: 'sse' and onResult: toSpeechOutput will receive parsed stream
chunks.
| return { | ||
| fetcher: (input: { audio: string | Blob; language?: string }) => | ||
| transcribeStreamFn({ | ||
| data: { | ||
| audio: input.audio as string, | ||
| language: input.language, | ||
| provider: config.id, | ||
| }, | ||
| }), | ||
| } |
There was a problem hiding this comment.
Use SSE connection mode for the streaming server function.
transcribeStreamFn returns an SSE Response, but this branch passes it as a fetcher result. The “Server Fn” mode will receive the raw response shape instead of parsed transcription stream chunks.
Proposed fix
return {
- fetcher: (input: { audio: string | Blob; language?: string }) =>
- transcribeStreamFn({
- data: {
- audio: input.audio as string,
- language: input.language,
- provider: config.id,
- },
- }),
+ connection: fetchServerSentEvents(
+ (input: { audio: string | Blob; language?: string }) =>
+ transcribeStreamFn({
+ data: {
+ audio: input.audio as string,
+ language: input.language,
+ provider: config.id,
+ },
+ }),
+ ),
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/ts-react-chat/src/routes/generations.transcription.tsx` around lines
41 - 50, The fetcher currently returns transcribeStreamFn(...) which produces a
raw Response; change the call so it requests SSE connection mode (so the fetcher
returns the streaming/evented output, not the raw response). Update the fetcher
invocation of transcribeStreamFn to pass the SSE mode (e.g., add a connection:
"sse" or call the transcribeStreamFn SSE-specific entry point) while keeping the
same data fields (audio, language, provider: config.id) so the client consumes
transcription chunks instead of the raw Response.
| export interface AudioGenerateInput { | ||
| /** Text description of the desired audio */ | ||
| prompt: string | ||
| /** Desired duration in seconds */ | ||
| duration?: number | ||
| /** Model-specific options */ | ||
| modelOptions?: Record<string, any> | ||
| } |
There was a problem hiding this comment.
Allow non-text audio workflows in AudioGenerateInput.
Line 215 makes prompt mandatory and there’s no source-audio field, so the new hooks can’t type-safely call audio-to-audio, enhancement, separation, or merge models without dummy prompts or provider-specific tunneling through modelOptions.
🔧 Proposed input shape
+type AudioSourceInput = string | File | Blob | ArrayBuffer
+
/**
* Input for audio generation (music, sound effects).
*/
-export interface AudioGenerateInput {
- /** Text description of the desired audio */
- prompt: string
+export type AudioGenerateInput =
+ | {
+ /** Text description of the desired audio */
+ prompt: string
+ /** Optional source audio for audio-to-audio capable models */
+ audio?: AudioSourceInput
+ /** Desired duration in seconds */
+ duration?: number
+ /** Model-specific options */
+ modelOptions?: Record<string, any>
+ }
+ | {
+ /** Optional prompt for models that accept both source audio and text guidance */
+ prompt?: string
+ /** Source audio for audio-to-audio, enhancement, separation, or merge models */
+ audio: AudioSourceInput
+ /** Desired duration in seconds */
+ duration?: number
+ /** Model-specific options */
+ modelOptions?: Record<string, any>
+ }
- /** Desired duration in seconds */
- duration?: number
- /** Model-specific options */
- modelOptions?: Record<string, any>
-}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export interface AudioGenerateInput { | |
| /** Text description of the desired audio */ | |
| prompt: string | |
| /** Desired duration in seconds */ | |
| duration?: number | |
| /** Model-specific options */ | |
| modelOptions?: Record<string, any> | |
| } | |
| type AudioSourceInput = string | File | Blob | ArrayBuffer | |
| /** | |
| * Input for audio generation (music, sound effects). | |
| */ | |
| export type AudioGenerateInput = | |
| | { | |
| /** Text description of the desired audio */ | |
| prompt: string | |
| /** Optional source audio for audio-to-audio capable models */ | |
| audio?: AudioSourceInput | |
| /** Desired duration in seconds */ | |
| duration?: number | |
| /** Model-specific options */ | |
| modelOptions?: Record<string, any> | |
| } | |
| | { | |
| /** Optional prompt for models that accept both source audio and text guidance */ | |
| prompt?: string | |
| /** Source audio for audio-to-audio, enhancement, separation, or merge models */ | |
| audio: AudioSourceInput | |
| /** Desired duration in seconds */ | |
| duration?: number | |
| /** Model-specific options */ | |
| modelOptions?: Record<string, any> | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-client/src/generation-types.ts` around lines 213 -
220, AudioGenerateInput currently requires prompt and has no source-audio field
which prevents type-safe audio-to-audio/enhancement/separation/merge workflows;
update the AudioGenerateInput interface to support non-text workflows by making
prompt optional (or changing to a discriminated union) and adding a source audio
property (e.g., sourceAudio: string | Uint8Array | {url: string}) plus an
inputType or mode discriminator (e.g., inputType: 'text' | 'audio') so callers
and providers can clearly distinguish text vs audio inputs while keeping
modelOptions for provider-specific overrides.
…ech-support # Conflicts: # terminalOutput
…ame, PCM channels, voice validation, image error surfacing
…s, response-shape validation, size params, proxy+apiKey, content types
…gnature mismatch, null guards
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/typescript/ai-gemini/src/adapters/image.ts (1)
208-256:⚠️ Potential issue | 🟡 MinorEmpty Imagen response still resolves to
images: []— inconsistent with the new Gemini transformer.The gate at Line 234 is
entries.length > 0 && images.length === 0, so whenresponse.generatedImagesisundefinedor[],transformImagenResponsesilently returns{ images: [] }. That reintroduces the exact failure mode the siblingtransformGeminiResponsewas just fixed to prevent (Line 177 throws onimages.length === 0regardless of parts), and it defeats the comment "forced to handle the failure mode explicitly" right above the check. Callers now get different contracts between Imagen and Gemini paths for the same "no images produced" condition.🛠️ Proposed fix: throw on any empty result, include filter reasons when present
- // Every entry was filtered — no usable images to return. Throw rather - // than resolve to an empty array so the caller is forced to handle the - // failure mode explicitly. - if (entries.length > 0 && images.length === 0) { - const joined = filterReasons.length > 0 ? filterReasons.join('; ') : '' - throw new Error( - `Imagen ${model} returned no images: all ${entries.length} generated image(s) were filtered by Responsible-AI${joined ? ` (${joined})` : ''}.`, - ) - } + // No usable images to return — throw rather than resolve to an empty + // array so callers must handle the failure mode explicitly, matching + // the Gemini transformer's contract. + if (images.length === 0) { + const joined = filterReasons.length > 0 ? filterReasons.join('; ') : '' + const detail = + entries.length === 0 + ? ' (no generated image entries were returned).' + : `: all ${entries.length} generated image(s) were filtered by Responsible-AI${joined ? ` (${joined})` : ''}.` + throw new Error(`Imagen ${model} returned no images${detail}`) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/adapters/image.ts` around lines 208 - 256, The function transformImagenResponse currently only throws when entries.length > 0 && images.length === 0 which lets undefined/empty generatedImages resolve to images: []; change the gate to throw whenever images.length === 0 (regardless of entries) so callers must handle an empty result; keep collecting filterReasons and include them in the thrown Error message (use filterReasons.join('; ') when present) and preserve the existing console.warn behavior for partial filters; update references in the message to include the model and entries.length for context (function: transformImagenResponse, vars: entries, images, filterReasons).
🧹 Nitpick comments (3)
packages/typescript/ai-gemini/tests/model-meta.test.ts (1)
592-600: Consider also asserting the newgemini-3.1-flash-tts-previewentry.Per the PR objectives and
GEMINI_TTS_MODELSinsrc/model-meta.ts(lines 1006-1010), the registry now includesGEMINI_3_1_FLASH_TTSalongside the 2.5 Pro/Flash entries. The uniqueness assertion covers accidental duplication, but an explicittoContainfor the 3.1 Flash TTS name would make this regression test catch a future name regression on that entry too.🧪 Optional addition
expect(GEMINI_TTS_MODELS).toContain('gemini-2.5-pro-preview-tts') expect(GEMINI_TTS_MODELS).toContain('gemini-2.5-flash-preview-tts') + expect(GEMINI_TTS_MODELS).toContain('gemini-3.1-flash-tts-preview') expect(new Set(GEMINI_TTS_MODELS).size).toBe(GEMINI_TTS_MODELS.length)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/tests/model-meta.test.ts` around lines 592 - 600, Add an explicit assertion for the new 3.1 Flash TTS entry in the "Gemini TTS model registry" test: inside the it block that checks GEMINI_TTS_MODELS, add an expect toContain for the GEMINI_3_1_FLASH_TTS constant (or the literal 'gemini-3.1-flash-tts-preview') alongside the existing checks for GEMINI_2_5_PRO_TTS and GEMINI_2_5_FLASH_TTS to ensure the 3.1 Flash TTS name is present and guarded against future regressions.packages/typescript/ai-fal/src/adapters/audio.ts (1)
98-127: Consider handling the pluralaudiosresponse shape some fal music models return.Several fal music endpoints (e.g., MusicGen variants) return
{ audios: [{ url, content_type }, …] }rather thanaudio/audio_url. Currently such responses will throw'Audio URL not found in fal audio generation response'. Pickingaudios[0]would extend coverage without changing the single-result contract.♻️ Proposed addition
if (data.audio && typeof data.audio === 'object' && 'url' in data.audio) { const audioObj = data.audio as { url: string; content_type?: string } audioUrl = audioObj.url contentType = audioObj.content_type } else if (typeof data.audio_url === 'string') { audioUrl = data.audio_url + } else if (Array.isArray(data.audios) && data.audios.length > 0) { + const first = data.audios[0] as { url?: string; content_type?: string } + audioUrl = first.url + contentType = first.content_type }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-fal/src/adapters/audio.ts` around lines 98 - 127, transformResponse currently only checks data.audio and data.audio_url, so responses with data.audios (an array) will fail; update transformResponse to also handle data.audios when it's an array by selecting the first item (const first = (data.audios as Array<Record<string, unknown>>)[0]) and extracting first.url and first.content_type into audioUrl/contentType, falling back to existing audio/audio_url logic; preserve the same return shape (id: response.requestId || this.generateId(), model: this.model, audio: { url: audioUrl, contentType: deriveAudioContentType(contentType, audioUrl) }) and keep throwing the existing error if no URL is found.packages/typescript/ai-gemini/src/adapters/audio.ts (1)
96-109: Validate the response shape instead of parsing throughany.This parses provider output manually with
part: any; use a small Zod schema for the inline audio part so malformed SDK/provider responses fail predictably and align with library validation conventions. As per coding guidelines,packages/typescript/**/src/**/*.ts: Use Zod for schema validation and tool definition across the library.♻️ Example direction
+const GeminiInlineAudioPartSchema = z.object({ + inlineData: z.object({ + mimeType: z.string().startsWith('audio/'), + data: z.string().min(1), + }), +}) + const parts = response.candidates?.[0]?.content?.parts ?? [] - const audioPart = parts.find((part: any) => - part.inlineData?.mimeType?.startsWith('audio/'), + const audioPartResult = parts + .map((part) => GeminiInlineAudioPartSchema.safeParse(part)) + .find((result) => result.success) + + if (!audioPartResult?.success) { + throw new Error('No audio data in Gemini Lyria response') ) - if (!audioPart?.inlineData?.data) { - throw new Error('No audio data in Gemini Lyria response') - } - - // audioPart was selected because mimeType.startsWith('audio/') was - // truthy, so the mime type is guaranteed to be a string here. Trust the - // value Gemini returned rather than inventing a non-standard - // `audio/mp3` fallback (IANA is `audio/mpeg`). - const contentType = audioPart.inlineData.mimeType + const { inlineData } = audioPartResult.data + const contentType = inlineData.mimeType🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/adapters/audio.ts` around lines 96 - 109, The code extracts audioPart from response.candidates using loose any typing; replace this with a Zod validation: define a small schema (e.g., InlineAudioPartSchema) that validates the shape { inlineData: { mimeType: string, data: string } } and apply it to response.candidates?.[0]?.content?.parts (or to the selected part) before using audioPart.inlineData.mimeType/data; if validation fails, throw a clear error like "Invalid Gemini Lyria audio part" so malformed provider/SDK responses fail predictably and follow the repo's Zod validation conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/typescript/ai-fal/src/adapters/speech.ts`:
- Around line 107-120: The computed format (in the block using contentTypeMime,
extractUrlExtension, and audioUrl) can be a MIME subtype like "mpeg" which is an
incorrect file extension; update the logic in that block (the format assignment
used when building the return object for response.requestId / this.generateId /
this.model) to normalize common MIME subtypes to proper extensions (e.g., "mpeg"
-> "mp3", "mp4" -> "m4a") and prefer a safeUrlExtension when one exists; ensure
contentType (contentTypeMime || `audio/${format}`) remains consistent after
normalization so consumers get a standard file extension.
In `@packages/typescript/ai-fal/src/utils/client.ts`:
- Around line 89-92: The switch handling audio extensions currently groups 'aac'
with 'm4a'/'mp4' and returns 'audio/mp4'; update the mapping so case 'm4a' and
case 'mp4' continue to return 'audio/mp4' but case 'aac' returns 'audio/aac'
instead (adjust the switch in the function that contains the case
'm4a'/'mp4'/'aac' to separate the 'aac' branch and its return value).
- Around line 59-64: The extractUrlExtension function fails for URLs with
fragments or trailing slashes; update extractUrlExtension to first strip URL
fragments (remove anything after '#') and trim trailing slashes before removing
query params and extracting the final segment so extension detection works for
inputs like 'https://x.com/a.mp3#t=5' and 'https://x.com/a.mp3/'; locate the
extractUrlExtension function and modify the preprocessing of url (remove
fragment, then remove query and trailing slashes) before splitting on '.' and
applying the existing /^[a-z0-9]{2,5}$/i test.
In `@packages/typescript/ai-fal/tests/utils.test.ts`:
- Around line 1-16: Move the static imports above the vitest mock call to
satisfy ESLint import/first and import/order: place the imports for
configureFalClient, deriveAudioContentType, extractUrlExtension, generateId
(from ../src/utils), mapSizeToFalFormat (from
../src/image/image-provider-options) and the fal import from '@fal-ai/client'
before the vi.mock('@fal-ai/client', ...) invocation so the vi.mock call remains
but all import statements appear first.
In `@packages/typescript/ai-gemini/src/adapters/audio.ts`:
- Line 143: The returned config currently spreads apiKey first then ...config so
config.apiKey can override the explicit apiKey argument; change the object
spread order when constructing the GeminiAudioAdapter (the return that calls new
GeminiAudioAdapter({ apiKey, ...config }, model)) so the explicit apiKey takes
precedence by placing ...config before apiKey (i.e., merge config then set
apiKey last) when calling the GeminiAudioAdapter constructor.
- Around line 79-83: The adapter is currently ignoring
AudioGenerationOptions.duration when building the request; update the logic
around const { model, prompt, modelOptions } = options and the call to
this.client.models.generateContent to either apply or reject duration: validate
options.duration up-front, and if model indicates a fixed-length provider (e.g.,
Lyria 3 Clip) throw a clear error when duration !== 30, otherwise for models
that accept variable duration (e.g., Lyria 3 Pro) inject the validated duration
into the generation request via modelOptions or by appending it to prompt before
calling this.client.models.generateContent so the provider receives the intended
duration. Ensure any rejection uses a descriptive error mentioning
AudioGenerationOptions.duration and the model name.
- Around line 32-35: Remove the unsupported negativePrompt field from the audio
generation types and code: delete the negativePrompt property from the
interface/props in packages/typescript/ai-gemini/src/adapters/audio.ts and stop
passing negativePrompt into the `@google/genai` GenerateContentConfig used by
functions like the music generation adapter (look for any call that builds a
GenerateContentConfig or calls client.generateContent/sendGenerateRequest),
ensuring the payload only uses fields supported by GenerateContentConfig; also
remove any TypeScript references/usages of negativePrompt to keep typings
consistent.
- Around line 20-25: Make GeminiAudioProviderOptions model-generic and constrain
responseMimeType with the same per-model resolution pattern used elsewhere:
change GeminiAudioProviderOptions to GeminiAudioProviderOptions<TModel> and type
responseMimeType using ResolveProviderOptions<TModel> (or a TModel conditional)
so only Pro-capable models (e.g., those where ResolveProviderOptions<TModel>
indicates WAV is supported or TModel extends 'lyria-3-pro') allow 'audio/wav',
otherwise responseMimeType is never/undefined; apply the same
generic/conditional change to the equivalent audio option types around the 64–78
region so WAV is compile-time allowed only for Pro models (reference symbols:
GeminiAudioProviderOptions, responseMimeType, ResolveProviderOptions<TModel>).
In `@packages/typescript/ai-gemini/src/adapters/tts.ts`:
- Around line 243-260: The current parsePcmMimeType incorrectly treats
containerized WAV as raw PCM because normalized.includes('codec=pcm') will match
values like "audio/wav;codec=pcm"; update parsePcmMimeType to only accept
codec=pcm when the subtype is not a WAV/container. Concretely: derive the media
subtype from normalized (after "audio/"), keep the existing bitDepthMatch and
explicit checks for 'audio/pcm' and 'audio/x-pcm', but replace the generic
normalized.includes('codec=pcm') with a guarded check that checks for
'codec=pcm' AND ensures the subtype is not 'wav', 'x-wav', 'wave' or any subtype
that contains 'wav' (for example by extracting the subtype via split('/') and
checking it does not include 'wav'). Apply this change inside parsePcmMimeType
using the existing variables normalized, bitDepthMatch and isPcm.
- Around line 64-66: The multiSpeakerVoiceConfig.speakerVoiceConfigs array
currently allows empty or 3+ entries even though only 1–2 speakers are
supported; add a validation guard in the adapter before any call that builds or
sends a Gemini TTS request (e.g., in the function that reads
multiSpeakerVoiceConfig and where the request payload is constructed — reference
symbols: multiSpeakerVoiceConfig and speakerVoiceConfigs and the TTS request
construction points around the existing handling at the earlier block and the
later block around lines 144–147) that checks speakerVoiceConfigs.length is 1 or
2 and returns/throws a clear error if not; apply the same guard at both
locations so invalid arrays are rejected client-side before calling Gemini.
- Around line 151-162: The code currently assigns speechConfig.voiceConfig =
modelOptions?.voiceConfig ?? { prebuiltVoiceConfig: { voiceName } } which allows
an empty modelOptions.voiceConfig (e.g. {}) to suppress the fallback voice;
update the logic in the code that sets speechConfig.voiceConfig (referencing
speechConfig, modelOptions, voiceConfig, prebuiltVoiceConfig, voiceName,
GeminiTTSVoice, GEMINI_TTS_VOICES) to detect when modelOptions.voiceConfig is
missing or does not include a valid prebuiltVoiceConfig.voiceName and in that
case set prebuiltVoiceConfig.voiceName to the computed voiceName (fallback to
'Kore'); ensure existing valid modelOptions.voiceConfig is preserved otherwise.
---
Outside diff comments:
In `@packages/typescript/ai-gemini/src/adapters/image.ts`:
- Around line 208-256: The function transformImagenResponse currently only
throws when entries.length > 0 && images.length === 0 which lets undefined/empty
generatedImages resolve to images: []; change the gate to throw whenever
images.length === 0 (regardless of entries) so callers must handle an empty
result; keep collecting filterReasons and include them in the thrown Error
message (use filterReasons.join('; ') when present) and preserve the existing
console.warn behavior for partial filters; update references in the message to
include the model and entries.length for context (function:
transformImagenResponse, vars: entries, images, filterReasons).
---
Nitpick comments:
In `@packages/typescript/ai-fal/src/adapters/audio.ts`:
- Around line 98-127: transformResponse currently only checks data.audio and
data.audio_url, so responses with data.audios (an array) will fail; update
transformResponse to also handle data.audios when it's an array by selecting the
first item (const first = (data.audios as Array<Record<string, unknown>>)[0])
and extracting first.url and first.content_type into audioUrl/contentType,
falling back to existing audio/audio_url logic; preserve the same return shape
(id: response.requestId || this.generateId(), model: this.model, audio: { url:
audioUrl, contentType: deriveAudioContentType(contentType, audioUrl) }) and keep
throwing the existing error if no URL is found.
In `@packages/typescript/ai-gemini/src/adapters/audio.ts`:
- Around line 96-109: The code extracts audioPart from response.candidates using
loose any typing; replace this with a Zod validation: define a small schema
(e.g., InlineAudioPartSchema) that validates the shape { inlineData: { mimeType:
string, data: string } } and apply it to
response.candidates?.[0]?.content?.parts (or to the selected part) before using
audioPart.inlineData.mimeType/data; if validation fails, throw a clear error
like "Invalid Gemini Lyria audio part" so malformed provider/SDK responses fail
predictably and follow the repo's Zod validation conventions.
In `@packages/typescript/ai-gemini/tests/model-meta.test.ts`:
- Around line 592-600: Add an explicit assertion for the new 3.1 Flash TTS entry
in the "Gemini TTS model registry" test: inside the it block that checks
GEMINI_TTS_MODELS, add an expect toContain for the GEMINI_3_1_FLASH_TTS constant
(or the literal 'gemini-3.1-flash-tts-preview') alongside the existing checks
for GEMINI_2_5_PRO_TTS and GEMINI_2_5_FLASH_TTS to ensure the 3.1 Flash TTS name
is present and guarded against future regressions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0f7664c1-c69f-413d-8fe2-ce9786e539f0
📒 Files selected for processing (22)
packages/typescript/ai-fal/src/adapters/audio.tspackages/typescript/ai-fal/src/adapters/image.tspackages/typescript/ai-fal/src/adapters/speech.tspackages/typescript/ai-fal/src/adapters/transcription.tspackages/typescript/ai-fal/src/image/image-provider-options.tspackages/typescript/ai-fal/src/utils/client.tspackages/typescript/ai-fal/src/utils/index.tspackages/typescript/ai-fal/tests/audio-adapter.test.tspackages/typescript/ai-fal/tests/image-adapter.test.tspackages/typescript/ai-fal/tests/speech-adapter.test.tspackages/typescript/ai-fal/tests/transcription-adapter.test.tspackages/typescript/ai-fal/tests/utils.test.tspackages/typescript/ai-fal/tests/video-adapter.test.tspackages/typescript/ai-gemini/src/adapters/audio.tspackages/typescript/ai-gemini/src/adapters/image.tspackages/typescript/ai-gemini/src/adapters/tts.tspackages/typescript/ai-gemini/src/image/image-provider-options.tspackages/typescript/ai-gemini/src/model-meta.tspackages/typescript/ai-gemini/tests/audio-adapter.test.tspackages/typescript/ai-gemini/tests/image-adapter.test.tspackages/typescript/ai-gemini/tests/model-meta.test.tspackages/typescript/ai-gemini/tests/tts-adapter.test.ts
✅ Files skipped from review due to trivial changes (1)
- packages/typescript/ai-fal/tests/audio-adapter.test.ts
🚧 Files skipped from review as they are similar to previous changes (7)
- packages/typescript/ai-fal/src/utils/index.ts
- packages/typescript/ai-fal/tests/image-adapter.test.ts
- packages/typescript/ai-gemini/src/model-meta.ts
- packages/typescript/ai-gemini/tests/audio-adapter.test.ts
- packages/typescript/ai-gemini/tests/tts-adapter.test.ts
- packages/typescript/ai-fal/src/adapters/image.ts
- packages/typescript/ai-fal/tests/transcription-adapter.test.ts
| export interface GeminiAudioProviderOptions { | ||
| /** | ||
| * Request WAV output instead of the default MP3. Lyria 3 Pro only; | ||
| * the Clip model always returns MP3 and will reject this field. | ||
| */ | ||
| responseMimeType?: 'audio/wav' |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🌐 Web query:
Gemini Lyria 3 Clip output format and Lyria 3 Pro responseMimeType audio/wav support
💡 Result:
Gemini Lyria 3 Clip (lyria-3-clip-preview) outputs audio clips up to 30 seconds in audio/mp3 format via the Gemini API generateContent method using response_modalities=["AUDIO", "TEXT"]. Vertex AI docs specify supported MIME types: audio/mp3 at 44.1 kHz, 192 kbps. Code examples save as .mp3 with MIME type audio/mp3. Lyria 3 Pro (lyria-3-pro-preview) outputs full-length tracks up to ~3 minutes (184 seconds max) similarly in audio/mp3 format. One example mentions possible WAV, but official Vertex AI lists audio/mp3; a REST example shows audio/mpeg. No explicit support for requesting audio/wav via responseMimeType or other params for Lyria 3 models. Older Lyria 2 (lyria-002) uses audio/wav in Vertex AI predict endpoint. Gemini API responseMimeType primarily controls text outputs (e.g., application/json), not audio modality MIME types, which appear fixed per model.
Citations:
- 1: https://gemilab.net/en/articles/gemini-api/lyria-3-pro-full-length-music-generation-guide
- 2: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/lyria/lyria-3
- 3: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/music/generate-music
- 4: https://shoepapp.com/?_=%2Fvertex-ai%2Fgenerative-ai%2Fdocs%2Fmodels%2Flyria%2Flyria-3%23ePsMxRqwbkfW1dkKxBRL7HShH9pP%2BC1Gl9XFSQ%3D%3D
- 5: https://cloud.google.com/vertex-ai/generative-ai/docs/music/generate-music
- 6: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference
🏁 Script executed:
# Find the audio adapter file
fd "audio.ts" packages/typescript/ai-gemini/src/adapters/
# Check the file content around the commented lines
cat -n packages/typescript/ai-gemini/src/adapters/audio.ts | head -100Repository: TanStack/ai
Length of output: 3768
🏁 Script executed:
# Search for GeminiAudioModel definition
rg "type GeminiAudioModel|interface GeminiAudioModel" packages/typescript/ai-gemini/src/
# Search for how responseMimeType is used
rg "responseMimeType" packages/typescript/ai-gemini/src/Repository: TanStack/ai
Length of output: 1194
🏁 Script executed:
# Check if there are any tests or documentation about this
fd "test\|spec\|\.md" packages/typescript/ai-gemini/src/ | head -20
rg "responseMimeType" packages/typescript/ai-gemini/ -A 3 -B 3Repository: TanStack/ai
Length of output: 12555
🏁 Script executed:
# Find and check model-meta.ts
cat packages/typescript/ai-gemini/src/model-meta.ts | head -150Repository: TanStack/ai
Length of output: 3505
🏁 Script executed:
# Search for GEMINI_AUDIO_MODELS definition
rg "GEMINI_AUDIO_MODELS" packages/typescript/ai-gemini/src/ -A 3Repository: TanStack/ai
Length of output: 1750
🏁 Script executed:
# Check for patterns of model-specific provider options in the codebase
rg "extends.*=.*\{" packages/typescript/ai-gemini/src/adapters/ -A 2 | head -40Repository: TanStack/ai
Length of output: 37
🏁 Script executed:
# Find LYRIA_3_PRO and LYRIA_3_CLIP definitions
rg "LYRIA_3_PRO|LYRIA_3_CLIP" packages/typescript/ai-gemini/src/model-meta.ts -A 10 | head -60Repository: TanStack/ai
Length of output: 770
🏁 Script executed:
# Check other adapters to see if model-specific provider options exist anywhere
rg "TModel extends" packages/typescript/ai-gemini/src/adapters/ -B 2 -A 2Repository: TanStack/ai
Length of output: 8209
🏁 Script executed:
# Check FAL package for the pattern mentioned in learnings
rg "keyof EndpointTypeMap" packages/typescript/ai-fal/src/ -B 3 -A 3 | head -50Repository: TanStack/ai
Length of output: 4028
Make responseMimeType model-specific using conditional types.
The current type allows responseMimeType: 'audio/wav' for all audio models, but the JSDoc explicitly states it's Pro-only and Clip will reject it. The text adapter already uses ResolveProviderOptions<TModel> for per-model option resolution; apply the same pattern here to provide compile-time safety:
♻️ Proposed type tightening
-export interface GeminiAudioProviderOptions {
+export type GeminiAudioProviderOptions<
+ TModel extends GeminiAudioModel = GeminiAudioModel,
+> = {
/**
- * Request WAV output instead of the default MP3. Lyria 3 Pro only;
- * the Clip model always returns MP3 and will reject this field.
+ * Seed for deterministic generation.
*/
- responseMimeType?: 'audio/wav'
-
- /**
- * Seed for deterministic generation.
- */
seed?: number
-
- /**
- * Negative prompt — describe what to exclude from the output.
- */
negativePrompt?: string
-}
+} & (TModel extends 'lyria-3-pro-preview'
+ ? {
+ /**
+ * Request WAV output instead of the default MP3. Lyria 3 Pro only.
+ */
+ responseMimeType?: 'audio/wav'
+ }
+ : {
+ responseMimeType?: never
+ })
@@
export class GeminiAudioAdapter<
TModel extends GeminiAudioModel,
-> extends BaseAudioAdapter<TModel, GeminiAudioProviderOptions> {
+> extends BaseAudioAdapter<TModel, GeminiAudioProviderOptions<TModel>> {
@@
async generateAudio(
- options: AudioGenerationOptions<GeminiAudioProviderOptions>,
+ options: AudioGenerationOptions<GeminiAudioProviderOptions<TModel>>,
): Promise<AudioGenerationResult> {Also applies to: 64–78
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-gemini/src/adapters/audio.ts` around lines 20 - 25,
Make GeminiAudioProviderOptions model-generic and constrain responseMimeType
with the same per-model resolution pattern used elsewhere: change
GeminiAudioProviderOptions to GeminiAudioProviderOptions<TModel> and type
responseMimeType using ResolveProviderOptions<TModel> (or a TModel conditional)
so only Pro-capable models (e.g., those where ResolveProviderOptions<TModel>
indicates WAV is supported or TModel extends 'lyria-3-pro') allow 'audio/wav',
otherwise responseMimeType is never/undefined; apply the same
generic/conditional change to the equivalent audio option types around the 64–78
region so WAV is compile-time allowed only for Pro models (reference symbols:
GeminiAudioProviderOptions, responseMimeType, ResolveProviderOptions<TModel>).
| const { model, prompt, modelOptions } = options | ||
|
|
||
| const response = await this.client.models.generateContent({ | ||
| model, | ||
| contents: [{ role: 'user', parts: [{ text: prompt }] }], |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Gemini Lyria 3 Pro duration control via prompt and Lyria 3 Clip fixed 30 seconds
💡 Result:
Gemini Lyria 3 Pro supports duration control via prompts, allowing generation of full-length songs up to a couple of minutes (e.g., 3 minutes) by specifying length in the prompt like "2 minutes 30 seconds" or using timestamps and song structure tags (e.g., [Verse], [Chorus]). Lyria 3 Clip is fixed at 30 seconds and does not support variable duration.
Citations:
- 1: https://ai.google.dev/gemini-api/docs/music-generation
- 2: https://gemilab.net/en/articles/gemini-advanced/gemini-lyria-3-pro-music-generation-guide
- 3: https://blog.google/innovation-and-ai/technology/ai/lyria-3-pro/
- 4: https://me.pcmag.com/en/ai/36164/with-lyria-3-pro-gemini-can-now-generate-3-minute-songs-from-prompts
- 5: https://www.cnet.com/tech/services-and-software/lyria-3-google-ai-music-generator/
- 6: https://openai.luzhipeng.com/google/lyria-3-clip-preview
🏁 Script executed:
find . -name "audio.ts" -path "*/ai-gemini/*" | head -5Repository: TanStack/ai
Length of output: 108
🏁 Script executed:
fd "audio\.ts" packages/typescript/ai-geminiRepository: TanStack/ai
Length of output: 106
🏁 Script executed:
rg -l "AudioGenerationOptions" --type ts packages/typescript/Repository: TanStack/ai
Length of output: 254
🏁 Script executed:
cat -n packages/typescript/ai-gemini/src/adapters/audio.ts | sed -n '70,90p'Repository: TanStack/ai
Length of output: 958
🏁 Script executed:
rg "type AudioGenerationOptions|interface AudioGenerationOptions" -A 10 packages/typescript/ai/src/types.tsRepository: TanStack/ai
Length of output: 414
Honor or reject duration instead of silently ignoring it.
AudioGenerationOptions.duration is part of the public audio API, but this adapter drops it without validation or error. For Lyria 3 Clip (fixed at 30 seconds) and Lyria 3 Pro (duration via prompt), callers requesting a specific duration get surprising output. Either validate and apply the duration parameter, or explicitly reject unsupported values.
🐛 Proposed fix: handle provider duration semantics
- const { model, prompt, modelOptions } = options
+ const { model, prompt, duration, modelOptions } = options
+
+ let contentsPrompt = prompt
+ if (duration != null) {
+ if (model === 'lyria-3-clip-preview' && duration !== 30) {
+ throw new Error(
+ 'lyria-3-clip-preview always generates 30-second clips; use lyria-3-pro-preview for controllable duration',
+ )
+ }
+
+ if (model === 'lyria-3-pro-preview') {
+ contentsPrompt = `Create an approximately ${duration}-second track.\n\n${prompt}`
+ }
+ }
const response = await this.client.models.generateContent({
model,
- contents: [{ role: 'user', parts: [{ text: prompt }] }],
+ contents: [{ role: 'user', parts: [{ text: contentsPrompt }] }],🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-gemini/src/adapters/audio.ts` around lines 79 - 83,
The adapter is currently ignoring AudioGenerationOptions.duration when building
the request; update the logic around const { model, prompt, modelOptions } =
options and the call to this.client.models.generateContent to either apply or
reject duration: validate options.duration up-front, and if model indicates a
fixed-length provider (e.g., Lyria 3 Clip) throw a clear error when duration !==
30, otherwise for models that accept variable duration (e.g., Lyria 3 Pro)
inject the validated duration into the generation request via modelOptions or by
appending it to prompt before calling this.client.models.generateContent so the
provider receives the intended duration. Ensure any rejection uses a descriptive
error mentioning AudioGenerationOptions.duration and the model name.
… meta, option filtering
…idation, falsy duration render
…base adapter ctor, type sync
…enforce The defensive nullish-coalescing on response.choices and img/img.imageUrl guards that the fix-loop added are impossible per the SDK type signatures; eslint's no-unnecessary-condition correctly rejects them. Keep only the typeof url !== 'string' check, which is a real runtime shape guard (imageUrl.url is typed as string but provider may send a non-string in rare degraded responses).
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (6)
examples/ts-react-chat/src/routes/generations.speech.tsx (1)
65-71:⚠️ Potential issue | 🟠 MajorWrap
generateSpeechStreamFninfetchServerSentEventsfor the Server Fn streaming mode.This branch still hands the raw SSE
ResponsefromgenerateSpeechStreamFntofetcher, soonResult: toSpeechOutputis invoked with theResponseinstead of parsed{ audio, contentType, ... }chunks and audio playback will fail in Server Fn mode.Proposed fix
return { - fetcher: (input: { text: string; voice?: string }) => - generateSpeechStreamFn({ - data: { ...input, provider: config.id }, - }), + connection: fetchServerSentEvents( + (input: { text: string; voice?: string }) => + generateSpeechStreamFn({ + data: { ...input, provider: config.id }, + }), + ), onResult: toSpeechOutput, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/routes/generations.speech.tsx` around lines 65 - 71, The fetcher currently passes the raw SSE Response from generateSpeechStreamFn to onResult; wrap the call in fetchServerSentEvents so the fetcher returns parsed SSE chunks instead of a Response. Specifically, change the fetcher to call fetchServerSentEvents(generateSpeechStreamFn({ data: { ...input, provider: config.id } })) so that onResult: toSpeechOutput receives parsed {audio, contentType, ...} chunks; ensure fetchServerSentEvents is imported/available in the file.examples/ts-react-chat/src/routes/generations.transcription.tsx (1)
41-50:⚠️ Potential issue | 🟠 MajorWrap
transcribeStreamFninfetchServerSentEventsfor the Server Fn streaming mode.
transcribeStreamFnreturns an SSEResponse, but this branch still provides it as a plainfetcher. The hook will treat the rawResponseas the final payload instead of consuming parsed stream chunks, so the Server Fn mode won't behave like the SSE mode.Proposed fix
return { - fetcher: (input: { audio: string | Blob; language?: string }) => - transcribeStreamFn({ - data: { - audio: input.audio as string, - language: input.language, - provider: config.id, - }, - }), + connection: fetchServerSentEvents( + (input: { audio: string | Blob; language?: string }) => + transcribeStreamFn({ + data: { + audio: input.audio as string, + language: input.language, + provider: config.id, + }, + }), + ), }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/routes/generations.transcription.tsx` around lines 41 - 50, The fetcher currently returns the raw SSE Response from transcribeStreamFn, so wrap the call in fetchServerSentEvents to convert the SSE Response into a streaming payload the hook can consume; update the fetcher (the object returned by the function) to call fetchServerSentEvents(() => transcribeStreamFn({ data: { audio: input.audio as string, language: input.language, provider: config.id } })) instead of returning transcribeStreamFn directly, ensuring transcribeStreamFn and fetchServerSentEvents are imported/available.packages/typescript/ai-event-client/src/index.ts (1)
519-530:⚠️ Potential issue | 🟠 MajorTighten
AudioRequestCompletedEvent.audioto mirrorGeneratedAudio's exclusive source union.As currently typed,
audioaccepts{}or{ url, b64Json }, which contradicts the PR’s promise thatGeneratedAudiorequires exactly one ofurl/b64Json. Mirror the contract in the event payload.🛡️ Proposed type tightening
+export type AudioEventSource = + | { url: string; b64Json?: never; contentType?: string; duration?: number } + | { b64Json: string; url?: never; contentType?: string; duration?: number } + /** Emitted when an audio generation request completes. */ export interface AudioRequestCompletedEvent extends BaseEventContext { requestId: string provider: string model: string - audio: { - url?: string - b64Json?: string - contentType?: string - duration?: number - } + audio: AudioEventSource duration: number }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-event-client/src/index.ts` around lines 519 - 530, The AudioRequestCompletedEvent.audio field currently allows both empty objects and combinations of url/b64Json; change it to exactly match the GeneratedAudio contract so only one source is allowed. Replace the inline audio type in AudioRequestCompletedEvent with the existing GeneratedAudio type (or an explicit union mirroring it: one branch with url and url-only semantics and one branch with b64Json and b64Json-only semantics), preserving optional contentType/duration, so consumers cannot provide both or neither; update the import/usage to reference GeneratedAudio instead of the loose inline object.packages/typescript/ai-gemini/src/adapters/audio.ts (3)
145-151:⚠️ Potential issue | 🟡 MinorExplicit
apiKeyargument can still be overridden byconfig.apiKeyat runtime.Despite
Omit<GeminiAudioConfig, 'apiKey'>hiding it from the type, a JS caller (or a caller usingas any) can pass{ apiKey: '...' }and silently override the explicit argument due to the{ apiKey, ...config }spread order. Reverse it so the explicit argument wins.🛡️ Proposed fix
- return new GeminiAudioAdapter({ apiKey, ...config }, model) + return new GeminiAudioAdapter({ ...config, apiKey }, model)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/adapters/audio.ts` around lines 145 - 151, The createGeminiAudio factory currently spreads apiKey then config so a runtime config with apiKey can override the explicit parameter; update the object construction in createGeminiAudio so the explicit apiKey wins (spread config first, then set apiKey) when creating the GeminiAudioAdapter, keeping the config type as Omit<GeminiAudioConfig, 'apiKey'> but protecting against JS callers or any casted inputs; reference function createGeminiAudio, type GeminiAudioConfig and class GeminiAudioAdapter.
32-36:⚠️ Potential issue | 🟡 Minor
negativePromptis still advertised in the public type but silently dropped at runtime.The previous round removed the forwarding of
negativePromptintoGenerateContentConfig(per the FIXME on lines 81-90), which is correct. However the field is still exposed onGeminiAudioProviderOptions, so callers can set it and get full type-safe autocomplete — then see absolutely no effect on output. Either remove it from the type or mark it@deprecated/ document that it's ignored until theai.live.music.connectpath is implemented.🐛 Proposed fix
seed?: number - - /** - * Negative prompt — describe what to exclude from the output. - */ - negativePrompt?: string }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/adapters/audio.ts` around lines 32 - 36, GeminiAudioProviderOptions currently exposes negativePrompt but it is dropped at runtime (not forwarded into GenerateContentConfig); either remove negativePrompt from the GeminiAudioProviderOptions type declaration in adapters/audio.ts or mark it as deprecated with a clear JSDoc note that it is ignored until the ai.live.music.connect path is implemented; update any exported types/usages to match (remove references or add `@deprecated` comment on negativePrompt in GeminiAudioProviderOptions) and run type checks to ensure no callers rely on the now-removed field.
79-101:⚠️ Potential issue | 🟡 Minor
options.durationis still silently ignored.
AudioGenerationOptions.durationis destructured away fromoptionsand never applied. Forlyria-3-clip-preview(fixed 30s) andlyria-3-pro-preview(prompt-controlled duration, up to ~3 min), callers requesting a specific duration receive surprising output with no warning. Either inject duration into the prompt for Pro, or throw a descriptive error for Clip whenduration != null && duration !== 30.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/adapters/audio.ts` around lines 79 - 101, The code currently ignores AudioGenerationOptions.duration; update the handler in adapters/audio.ts (around the generateContent call and where options are destructured) to read duration from options and enforce model-specific behavior: if model === 'lyria-3-clip-preview' and duration != null && duration !== 30 throw a descriptive error indicating clip preview is fixed to 30s; if model === 'lyria-3-pro-preview' and duration != null inject the requested duration into the prompt (e.g., append a short instruction like "Generate audio of X seconds") before calling this.client.models.generateContent; keep the existing seed/responseMimeType handling and ensure modelOptions and prompt variable names are used to find where to change.
🧹 Nitpick comments (11)
packages/typescript/ai-openai/src/adapters/image.ts (2)
101-112: LGTM — flatMap produces mutually-exclusiveurl/b64Jsonshape.Each
GeneratedImagenow satisfies exactly one branch ofGeneratedMediaSource, and items with neither field are dropped instead of producing an empty{}that would violate the new union. Matches the types tightening inpackages/typescript/ai/src/types.ts.One minor worth noting: when an item has neither
urlnorb64_json, it's silently dropped — consider aconsole.warn(or surfacing via the devtoolsimage:request:completedevent) so callers aren't left wondering whyimages.length < response.data.length.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-openai/src/adapters/image.ts` around lines 101 - 112, The current mapping into images drops items with neither url nor b64_json silently; update the flatMap block that constructs images (the const images: Array<GeneratedImage> = (response.data ?? []).flatMap(...) in adapters/image.ts) to log a warning when an item has neither field so callers can diagnose missing media (e.g., console.warn with context or emit the devtools event 'image:request:completed' with a note), while preserving the existing branches that return GeneratedImage when item.b64_json or item.url are present.
55-58: Consider passingconfigtosuper(...)for consistency.Unlike
OpenAITTSAdapterandOpenAITranscriptionAdapter— which now callsuper(model, config)— this adapter still discards config withsuper(model, {}). The parent'sthis.configends up as{}. Functionally fine today because this adapter owns its ownclientand doesn't readthis.config, but it drifts from the sibling adapters and hides config from any future base-class hook that inspectsthis.config.♻️ Suggested alignment
constructor(config: OpenAIImageConfig, model: TModel) { - super(model, {}) + super(model, config) this.client = createOpenAIClient(config) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-openai/src/adapters/image.ts` around lines 55 - 58, The constructor in OpenAIImageAdapter currently calls super(model, {}) and discards the provided OpenAIImageConfig; update the constructor to pass the incoming config to the base class (i.e., call super(model, config)) so this.config is initialized consistently with OpenAITTSAdapter and OpenAITranscriptionAdapter, while leaving createOpenAIClient(config) and this.client initialization unchanged.packages/typescript/ai-gemini/src/model-meta.ts (1)
1016-1019: Consider exporting aGeminiAudioModelstype for API consistency.
GEMINI_MODELS/GEMINI_IMAGE_MODELSeach have a companion(typeof X)[number]alias (GeminiModels,GeminiImageModels).GEMINI_AUDIO_MODELSdoesn't, which makes it awkward for downstream code (e.g., the newgeminiAudioadapter) to narrow to known Lyria models.♻️ Proposed addition
export const GEMINI_AUDIO_MODELS = [ LYRIA_3_PRO.name, LYRIA_3_CLIP.name, ] as const + +export type GeminiAudioModels = (typeof GEMINI_AUDIO_MODELS)[number]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/src/model-meta.ts` around lines 1016 - 1019, Add a companion exported type alias for GEMINI_AUDIO_MODELS to match the API pattern used by GEMINI_MODELS/GEMINI_IMAGE_MODELS; specifically, export a type named GeminiAudioModels that is (typeof GEMINI_AUDIO_MODELS)[number] so downstream code (e.g., geminiAudio adapter) can narrow values to the known Lyria audio model names; update exports if necessary to ensure GeminiAudioModels is exported from the module.packages/typescript/ai-gemini/tests/model-meta.test.ts (1)
592-600: Assert the newly added Flash 3.1 TTS model in the registry test.This PR prepends
gemini-3.1-flash-tts-previewtoGEMINI_TTS_MODELS, but the registry test doesn't cover it. Adding an explicit check prevents it from being accidentally dropped from the list.💚 Proposed test addition
expect(GEMINI_TTS_MODELS).toContain('gemini-2.5-pro-preview-tts') expect(GEMINI_TTS_MODELS).toContain('gemini-2.5-flash-preview-tts') + expect(GEMINI_TTS_MODELS).toContain('gemini-3.1-flash-tts-preview') expect(new Set(GEMINI_TTS_MODELS).size).toBe(GEMINI_TTS_MODELS.length)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-gemini/tests/model-meta.test.ts` around lines 592 - 600, The test in the "Gemini TTS model registry" suite is missing an assertion for the newly added Flash 3.1 TTS model; update the it block that checks GEMINI_TTS_MODELS to also expect the array toContain 'gemini-3.1-flash-tts-preview' (alongside the existing checks for 'gemini-2.5-pro-preview-tts' and 'gemini-2.5-flash-preview-tts') so the new model cannot be accidentally removed from GEMINI_TTS_MODELS.packages/typescript/ai-event-client/src/index.ts (1)
560-564:AudioUsageEvent.usageis typed asImageUsage— introduce an audio-specific type.
ImageUsageis semantically image-specific (input/output tokens for image models). Using it for audio usage is confusing for consumers and will be wrong if audio usage grows modality-specific fields (e.g., audio seconds, character count). Define a dedicatedAudioUsage(even if initially identical) so the public API is honest about the modality.♻️ Proposed type
+export interface AudioUsage { + inputTokens?: number + outputTokens?: number + totalTokens?: number + /** Duration of generated audio in seconds, when available */ + audioSeconds?: number + /** Input character count for TTS/music prompts, when available */ + characters?: number +} + /** Emitted when audio usage metrics are available. */ export interface AudioUsageEvent extends BaseEventContext { requestId: string model: string - usage: ImageUsage + usage: AudioUsage }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-event-client/src/index.ts` around lines 560 - 564, The AudioUsageEvent currently reuses ImageUsage which is semantically incorrect; add a new exported type/interface named AudioUsage (can initially mirror ImageUsage fields) and update the AudioUsageEvent declaration to use AudioUsage instead of ImageUsage; ensure the new AudioUsage type is exported alongside ImageUsage so consumers can import it and update any references to ImageUsage where they specifically pertain to audio usage (symbols: AudioUsageEvent, ImageUsage, AudioUsage).packages/typescript/ai/src/activities/generateAudio/adapter.ts (2)
38-40:~typesas a required interface property forces implementations to declare it.Any non-
BaseAudioAdapterimplementation ofAudioAdaptermust adddeclare '~types': { providerOptions: TProviderOptions }(or assign it), otherwise TS structural typing will reject it. Marking it optional ('~types'?:) — or documenting the expectation on the interface — would make the contract friendlier for third-party adapters without changing inference at the call sites that useTAdapter['~types']['providerOptions'].🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai/src/activities/generateAudio/adapter.ts` around lines 38 - 40, The interface currently requires the literal property '~types' which forces every AudioAdapter implementation to explicitly declare it; update the AudioAdapter definition to make '~types' optional (change '~types' to '~types'?) so third‑party adapters don't need to declare it while preserving inference at call sites that read TAdapter['~types']['providerOptions']; ensure BaseAudioAdapter still provides the concrete '~types' shape and keep TProviderOptions type unchanged so existing consumers continue to infer providerOptions correctly.
86-88:generateIdyields surprisingly little entropy.
Math.random().toString(36)produces"0.xxxxxxxxxxxx"(~12 base-36 chars of payload starting at index 2);.substring(7)drops another 5 chars, leaving roughly 4–8 base-36 characters of randomness. With a per-ms timestamp prefix, two adapter calls firing in the same millisecond (e.g. parallelgenerateAudioin tests or streaming) have a non-trivial collision probability for correlation IDs emitted to devtools events.Since IDs are for correlation, not security,
crypto.randomUUID()(with a Node/browser-safe fallback) or.substring(2, 12)would be much safer and just as cheap.♻️ Proposed fix
protected generateId(): string { - return `${this.name}-${Date.now()}-${Math.random().toString(36).substring(7)}` + return `${this.name}-${Date.now()}-${Math.random().toString(36).slice(2, 12)}` }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai/src/activities/generateAudio/adapter.ts` around lines 86 - 88, The current generateId method creates low-entropy IDs and risks collisions; update the generateId implementation in the class (method generateId, referencing this.name) to use a higher-entropy source such as crypto.randomUUID() when available, with a safe fallback for older environments (e.g., use Math.random().toString(36).substring(2,12) or another longer substring) and keep the same prefix format `${this.name}-<id>` so tests/devtools correlation uses a much stronger unique value.examples/ts-react-chat/src/routes/generations.audio.tsx (2)
2-12: Apply ESLint fixes for import ordering and type-only imports.Static analysis flags four violations here:
@tanstack/ai-reactand@tanstack/aitype imports should come after the../lib/audio-providersimport, and the inlinetypespecifiers on lines 10-11 should be promoted to a top-level type-only import.♻️ Proposed fix
import { useEffect, useMemo, useRef, useState } from 'react' import { createFileRoute } from '@tanstack/react-router' import { useGenerateAudio } from '@tanstack/ai-react' -import type { UseGenerateAudioReturn } from '@tanstack/ai-react' import { fetchServerSentEvents } from '@tanstack/ai-client' -import type { AudioGenerationResult } from '@tanstack/ai' import { generateAudioFn } from '../lib/server-fns' -import { - AUDIO_PROVIDERS, - type AudioProviderConfig, - type AudioProviderId, -} from '../lib/audio-providers' +import { AUDIO_PROVIDERS } from '../lib/audio-providers' +import type { UseGenerateAudioReturn } from '@tanstack/ai-react' +import type { AudioGenerationResult } from '@tanstack/ai' +import type { + AudioProviderConfig, + AudioProviderId, +} from '../lib/audio-providers'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/routes/generations.audio.tsx` around lines 2 - 12, Reorder and convert the imports to satisfy ESLint: move the local imports (generateAudioFn, AUDIO_PROVIDERS, AudioProviderConfig, AudioProviderId) before type-only imports from external packages, and change inline `type` specifiers to a top-level type-only import. Specifically, ensure imports for createFileRoute, useGenerateAudio, fetchServerSentEvents remain, then import AUDIO_PROVIDERS, generateAudioFn, AudioProviderConfig, AudioProviderId, and finally add a single `import type { UseGenerateAudioReturn, AudioGenerationResult } from '@tanstack/ai-react'` (or split by package if necessary) so that UseGenerateAudioReturn and AudioGenerationResult are imported as top-level type-only imports rather than inline `type` specifiers.
286-286: Guard the non-null assertion onAUDIO_PROVIDERS.find.
providerstate is only ever set from the provider tabs, so this is currently safe, but the non-null assertion will turn any future typo or stale persisted value into a runtime crash on first render. A simple fallback toAUDIO_PROVIDERS[0]keeps the page alive.🛡️ Proposed fix
- const config = AUDIO_PROVIDERS.find((p) => p.id === provider)! + const config = + AUDIO_PROVIDERS.find((p) => p.id === provider) ?? AUDIO_PROVIDERS[0]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/routes/generations.audio.tsx` at line 286, The code uses a non-null assertion when locating the audio provider: change the line that sets config (const config = AUDIO_PROVIDERS.find((p) => p.id === provider)!) to safely handle a missing match by falling back to a default (e.g., AUDIO_PROVIDERS[0]) and/or null-checking provider; ensure you use AUDIO_PROVIDERS.find(...) || AUDIO_PROVIDERS[0] (or equivalent) so config is never undefined and the page doesn't crash on first render if the provider value is stale or mistyped.packages/typescript/ai-fal/src/adapters/audio.ts (2)
65-74:BaseAudioAdapterreceives{}and drops the fal config.
super(model, {})stores an emptyAudioAdapterConfigon the base class, throwing awayconfig.apiKey/config.proxyUrl.configureFalClientuses them at construction time, so there's no functional bug today, but if any shared base-class logic starts readingthis.config(retries, headers, timeouts) this adapter will silently behave as if unconfigured. Passing the (compatible) subset through — or at leastconfig ?? {}— would be more future-proof.♻️ Proposed tweak
- constructor(model: TModel, config?: FalClientConfig) { - super(model, {}) - configureFalClient(config) - } + constructor(model: TModel, config?: FalClientConfig) { + super(model, config ?? {}) + configureFalClient(config) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-fal/src/adapters/audio.ts` around lines 65 - 74, FalAudioAdapter's constructor calls super(model, {}) which discards the incoming FalClientConfig and leaves the base AudioAdapterConfig empty; update the constructor to pass the provided config (or config ?? {}) into the BaseAudioAdapter so shared base-class logic can access apiKey/proxyUrl/timeouts stored on this.config. Specifically, modify the FalAudioAdapter constructor (the super(model, {}) call) to forward the compatible subset of FalClientConfig into BaseAudioAdapter, while keeping the configureFalClient(config) call.
119-126: Consider propagatingduration(and the inverse ms→s translation) into the result.The adapter translates
options.durationinto model-specific fields (e.g.,music_length_ms) on the way in, but the returnedAudioGenerationResult.audio.durationis always unset. Callers that displayed the originally requested duration will lose that information after a round-trip, and for ElevenLabs the response may carry amusic_length_msvalue that could be surfaced asduration(seconds). Low-priority, but cheap to pass through when available.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-fal/src/adapters/audio.ts` around lines 119 - 126, The adapter currently returns AudioGenerationResult without a duration; update the return object in the function that builds the response (the block that returns id: response.requestId || this.generateId(), model: this.model, audio: {...}) to include audio.duration (in seconds) by propagating the original options.duration if present, and if not, extract provider-specific duration fields from the response (e.g., music_length_ms) converting milliseconds→seconds (divide by 1000). Use deriveAudioContentType as-is and ensure the duration is a number (seconds) or undefined when not available.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/ts-react-chat/src/routes/api.generate.speech.ts`:
- Around line 44-51: The code uses parsed.error.flatten(), which is deprecated;
replace that call with the top-level z.flattenError(parsed.error) and pass its
result into jsonError instead. Update the validation block around
SPEECH_BODY_SCHEMA.safeParse (the parsed variable) so that when parsed.success
is false you call z.flattenError(parsed.error) and include that flattened result
in the details field passed to jsonError (also update the same pattern in the
handlers for audio and transcribe).
In `@examples/ts-react-chat/src/routes/generations.audio.tsx`:
- Around line 23-50: The toAudioOutput(AudioGenerationResult) function currently
throws when audio has neither url nor b64Json; change it to not throw inside the
onResult callback used by useGenerateAudio — instead return null (per the hook
contract) to keep the previous result unchanged and invoke the provided error
handler (e.g., call onError / propagate an error via the hook's error pathway)
with a clear message referencing the problematic AudioGenerationResult; update
callers that rely on toAudioOutput (the onResult consumer in useGenerateAudio)
to expect null as a valid return and surface the failure through the hook's
error state rather than letting a synchronous throw escape.
In `@packages/typescript/ai/src/activities/generateTranscription/adapter.ts`:
- Line 77: The constructor argument order for
BaseTranscriptionAdapter/BaseImageAdapter/BaseTTSAdapter changed from (config,
model) to (model, config), so update every subclass that still calls super with
the old order to pass the model first and config second (e.g., change super({},
model) → super(model, {}) and super(config, model) → super(model, config)) in
the listed adapters (ai-openai/adapters/summarize.ts, video.ts, text.ts;
ai-openrouter/adapters/summarize.ts, text.ts; ai-fal/adapters/video.ts;
ai-ollama/adapters/text.ts; ai-groq/adapters/text.ts;
ai-grok/adapters/summarize.ts, text.ts; ai-gemini/adapters/text.ts;
ai-anthropic/adapters/summarize.ts, text.ts) and any other in-tree/out-of-tree
subclasses; also add a changeset entry for `@tanstack/ai` with a minor version
bump documenting this breaking constructor change and include a migration note
instructing external adapter authors to swap the constructor argument order to
(model, config).
---
Duplicate comments:
In `@examples/ts-react-chat/src/routes/generations.speech.tsx`:
- Around line 65-71: The fetcher currently passes the raw SSE Response from
generateSpeechStreamFn to onResult; wrap the call in fetchServerSentEvents so
the fetcher returns parsed SSE chunks instead of a Response. Specifically,
change the fetcher to call fetchServerSentEvents(generateSpeechStreamFn({ data:
{ ...input, provider: config.id } })) so that onResult: toSpeechOutput receives
parsed {audio, contentType, ...} chunks; ensure fetchServerSentEvents is
imported/available in the file.
In `@examples/ts-react-chat/src/routes/generations.transcription.tsx`:
- Around line 41-50: The fetcher currently returns the raw SSE Response from
transcribeStreamFn, so wrap the call in fetchServerSentEvents to convert the SSE
Response into a streaming payload the hook can consume; update the fetcher (the
object returned by the function) to call fetchServerSentEvents(() =>
transcribeStreamFn({ data: { audio: input.audio as string, language:
input.language, provider: config.id } })) instead of returning
transcribeStreamFn directly, ensuring transcribeStreamFn and
fetchServerSentEvents are imported/available.
In `@packages/typescript/ai-event-client/src/index.ts`:
- Around line 519-530: The AudioRequestCompletedEvent.audio field currently
allows both empty objects and combinations of url/b64Json; change it to exactly
match the GeneratedAudio contract so only one source is allowed. Replace the
inline audio type in AudioRequestCompletedEvent with the existing GeneratedAudio
type (or an explicit union mirroring it: one branch with url and url-only
semantics and one branch with b64Json and b64Json-only semantics), preserving
optional contentType/duration, so consumers cannot provide both or neither;
update the import/usage to reference GeneratedAudio instead of the loose inline
object.
In `@packages/typescript/ai-gemini/src/adapters/audio.ts`:
- Around line 145-151: The createGeminiAudio factory currently spreads apiKey
then config so a runtime config with apiKey can override the explicit parameter;
update the object construction in createGeminiAudio so the explicit apiKey wins
(spread config first, then set apiKey) when creating the GeminiAudioAdapter,
keeping the config type as Omit<GeminiAudioConfig, 'apiKey'> but protecting
against JS callers or any casted inputs; reference function createGeminiAudio,
type GeminiAudioConfig and class GeminiAudioAdapter.
- Around line 32-36: GeminiAudioProviderOptions currently exposes negativePrompt
but it is dropped at runtime (not forwarded into GenerateContentConfig); either
remove negativePrompt from the GeminiAudioProviderOptions type declaration in
adapters/audio.ts or mark it as deprecated with a clear JSDoc note that it is
ignored until the ai.live.music.connect path is implemented; update any exported
types/usages to match (remove references or add `@deprecated` comment on
negativePrompt in GeminiAudioProviderOptions) and run type checks to ensure no
callers rely on the now-removed field.
- Around line 79-101: The code currently ignores
AudioGenerationOptions.duration; update the handler in adapters/audio.ts (around
the generateContent call and where options are destructured) to read duration
from options and enforce model-specific behavior: if model ===
'lyria-3-clip-preview' and duration != null && duration !== 30 throw a
descriptive error indicating clip preview is fixed to 30s; if model ===
'lyria-3-pro-preview' and duration != null inject the requested duration into
the prompt (e.g., append a short instruction like "Generate audio of X seconds")
before calling this.client.models.generateContent; keep the existing
seed/responseMimeType handling and ensure modelOptions and prompt variable names
are used to find where to change.
---
Nitpick comments:
In `@examples/ts-react-chat/src/routes/generations.audio.tsx`:
- Around line 2-12: Reorder and convert the imports to satisfy ESLint: move the
local imports (generateAudioFn, AUDIO_PROVIDERS, AudioProviderConfig,
AudioProviderId) before type-only imports from external packages, and change
inline `type` specifiers to a top-level type-only import. Specifically, ensure
imports for createFileRoute, useGenerateAudio, fetchServerSentEvents remain,
then import AUDIO_PROVIDERS, generateAudioFn, AudioProviderConfig,
AudioProviderId, and finally add a single `import type { UseGenerateAudioReturn,
AudioGenerationResult } from '@tanstack/ai-react'` (or split by package if
necessary) so that UseGenerateAudioReturn and AudioGenerationResult are imported
as top-level type-only imports rather than inline `type` specifiers.
- Line 286: The code uses a non-null assertion when locating the audio provider:
change the line that sets config (const config = AUDIO_PROVIDERS.find((p) =>
p.id === provider)!) to safely handle a missing match by falling back to a
default (e.g., AUDIO_PROVIDERS[0]) and/or null-checking provider; ensure you use
AUDIO_PROVIDERS.find(...) || AUDIO_PROVIDERS[0] (or equivalent) so config is
never undefined and the page doesn't crash on first render if the provider value
is stale or mistyped.
In `@packages/typescript/ai-event-client/src/index.ts`:
- Around line 560-564: The AudioUsageEvent currently reuses ImageUsage which is
semantically incorrect; add a new exported type/interface named AudioUsage (can
initially mirror ImageUsage fields) and update the AudioUsageEvent declaration
to use AudioUsage instead of ImageUsage; ensure the new AudioUsage type is
exported alongside ImageUsage so consumers can import it and update any
references to ImageUsage where they specifically pertain to audio usage
(symbols: AudioUsageEvent, ImageUsage, AudioUsage).
In `@packages/typescript/ai-fal/src/adapters/audio.ts`:
- Around line 65-74: FalAudioAdapter's constructor calls super(model, {}) which
discards the incoming FalClientConfig and leaves the base AudioAdapterConfig
empty; update the constructor to pass the provided config (or config ?? {}) into
the BaseAudioAdapter so shared base-class logic can access
apiKey/proxyUrl/timeouts stored on this.config. Specifically, modify the
FalAudioAdapter constructor (the super(model, {}) call) to forward the
compatible subset of FalClientConfig into BaseAudioAdapter, while keeping the
configureFalClient(config) call.
- Around line 119-126: The adapter currently returns AudioGenerationResult
without a duration; update the return object in the function that builds the
response (the block that returns id: response.requestId || this.generateId(),
model: this.model, audio: {...}) to include audio.duration (in seconds) by
propagating the original options.duration if present, and if not, extract
provider-specific duration fields from the response (e.g., music_length_ms)
converting milliseconds→seconds (divide by 1000). Use deriveAudioContentType
as-is and ensure the duration is a number (seconds) or undefined when not
available.
In `@packages/typescript/ai-gemini/src/model-meta.ts`:
- Around line 1016-1019: Add a companion exported type alias for
GEMINI_AUDIO_MODELS to match the API pattern used by
GEMINI_MODELS/GEMINI_IMAGE_MODELS; specifically, export a type named
GeminiAudioModels that is (typeof GEMINI_AUDIO_MODELS)[number] so downstream
code (e.g., geminiAudio adapter) can narrow values to the known Lyria audio
model names; update exports if necessary to ensure GeminiAudioModels is exported
from the module.
In `@packages/typescript/ai-gemini/tests/model-meta.test.ts`:
- Around line 592-600: The test in the "Gemini TTS model registry" suite is
missing an assertion for the newly added Flash 3.1 TTS model; update the it
block that checks GEMINI_TTS_MODELS to also expect the array toContain
'gemini-3.1-flash-tts-preview' (alongside the existing checks for
'gemini-2.5-pro-preview-tts' and 'gemini-2.5-flash-preview-tts') so the new
model cannot be accidentally removed from GEMINI_TTS_MODELS.
In `@packages/typescript/ai-openai/src/adapters/image.ts`:
- Around line 101-112: The current mapping into images drops items with neither
url nor b64_json silently; update the flatMap block that constructs images (the
const images: Array<GeneratedImage> = (response.data ?? []).flatMap(...) in
adapters/image.ts) to log a warning when an item has neither field so callers
can diagnose missing media (e.g., console.warn with context or emit the devtools
event 'image:request:completed' with a note), while preserving the existing
branches that return GeneratedImage when item.b64_json or item.url are present.
- Around line 55-58: The constructor in OpenAIImageAdapter currently calls
super(model, {}) and discards the provided OpenAIImageConfig; update the
constructor to pass the incoming config to the base class (i.e., call
super(model, config)) so this.config is initialized consistently with
OpenAITTSAdapter and OpenAITranscriptionAdapter, while leaving
createOpenAIClient(config) and this.client initialization unchanged.
In `@packages/typescript/ai/src/activities/generateAudio/adapter.ts`:
- Around line 38-40: The interface currently requires the literal property
'~types' which forces every AudioAdapter implementation to explicitly declare
it; update the AudioAdapter definition to make '~types' optional (change
'~types' to '~types'?) so third‑party adapters don't need to declare it while
preserving inference at call sites that read
TAdapter['~types']['providerOptions']; ensure BaseAudioAdapter still provides
the concrete '~types' shape and keep TProviderOptions type unchanged so existing
consumers continue to infer providerOptions correctly.
- Around line 86-88: The current generateId method creates low-entropy IDs and
risks collisions; update the generateId implementation in the class (method
generateId, referencing this.name) to use a higher-entropy source such as
crypto.randomUUID() when available, with a safe fallback for older environments
(e.g., use Math.random().toString(36).substring(2,12) or another longer
substring) and keep the same prefix format `${this.name}-<id>` so tests/devtools
correlation uses a much stronger unique value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b14bfa20-1214-4b31-bc2f-660a60babc8c
📒 Files selected for processing (34)
examples/ts-react-chat/src/routes/api.generate.audio.tsexamples/ts-react-chat/src/routes/api.generate.speech.tsexamples/ts-react-chat/src/routes/api.transcribe.tsexamples/ts-react-chat/src/routes/generations.audio.tsxexamples/ts-react-chat/src/routes/generations.speech.tsxexamples/ts-react-chat/src/routes/generations.transcription.tsxexamples/ts-react-chat/src/routes/index.tsxpackages/typescript/ai-client/src/generation-types.tspackages/typescript/ai-event-client/src/index.tspackages/typescript/ai-fal/src/adapters/audio.tspackages/typescript/ai-fal/src/adapters/image.tspackages/typescript/ai-fal/src/adapters/speech.tspackages/typescript/ai-fal/src/adapters/transcription.tspackages/typescript/ai-fal/tests/image-adapter.test.tspackages/typescript/ai-gemini/src/adapters/audio.tspackages/typescript/ai-gemini/src/adapters/image.tspackages/typescript/ai-gemini/src/adapters/tts.tspackages/typescript/ai-gemini/src/model-meta.tspackages/typescript/ai-gemini/tests/audio-adapter.test.tspackages/typescript/ai-gemini/tests/model-meta.test.tspackages/typescript/ai-grok/src/adapters/image.tspackages/typescript/ai-openai/src/adapters/image.tspackages/typescript/ai-openai/src/adapters/transcription.tspackages/typescript/ai-openai/src/adapters/tts.tspackages/typescript/ai-openrouter/src/adapters/image.tspackages/typescript/ai-openrouter/tests/image-adapter.test.tspackages/typescript/ai/src/activities/generateAudio/adapter.tspackages/typescript/ai/src/activities/generateAudio/index.tspackages/typescript/ai/src/activities/generateImage/adapter.tspackages/typescript/ai/src/activities/generateSpeech/adapter.tspackages/typescript/ai/src/activities/generateSpeech/index.tspackages/typescript/ai/src/activities/generateTranscription/adapter.tspackages/typescript/ai/src/activities/generateTranscription/index.tspackages/typescript/ai/tests/image-per-model-type-safety.test.ts
✅ Files skipped from review due to trivial changes (2)
- examples/ts-react-chat/src/routes/index.tsx
- packages/typescript/ai-openrouter/tests/image-adapter.test.ts
🚧 Files skipped from review as they are similar to previous changes (12)
- packages/typescript/ai-fal/src/adapters/image.ts
- packages/typescript/ai-client/src/generation-types.ts
- packages/typescript/ai-fal/tests/image-adapter.test.ts
- examples/ts-react-chat/src/routes/api.generate.audio.ts
- packages/typescript/ai-grok/src/adapters/image.ts
- packages/typescript/ai-gemini/src/adapters/image.ts
- packages/typescript/ai/src/activities/generateSpeech/index.ts
- packages/typescript/ai-gemini/tests/audio-adapter.test.ts
- packages/typescript/ai-openrouter/src/adapters/image.ts
- packages/typescript/ai-fal/src/adapters/speech.ts
- packages/typescript/ai-gemini/src/adapters/tts.ts
- packages/typescript/ai/src/activities/generateAudio/index.ts
| const parsed = SPEECH_BODY_SCHEMA.safeParse(rawData) | ||
| if (!parsed.success) { | ||
| return jsonError(400, { | ||
| error: 'validation_failed', | ||
| message: 'Request data failed validation', | ||
| details: parsed.error.flatten(), | ||
| }) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find other occurrences of the deprecated ZodError.flatten() pattern across examples + packages.
rg -nP --type=ts -C1 '\.error\.flatten\s*\('Repository: TanStack/ai
Length of output: 921
🏁 Script executed:
find . -name "package.json" -type f | head -10Repository: TanStack/ai
Length of output: 432
🏁 Script executed:
cat package.jsonRepository: TanStack/ai
Length of output: 3931
🏁 Script executed:
cat ./examples/ts-react-chat/package.jsonRepository: TanStack/ai
Length of output: 2072
🏁 Script executed:
cat -n ./examples/ts-react-chat/src/routes/api.generate.speech.tsRepository: TanStack/ai
Length of output: 2819
🌐 Web query:
Zod 4 ZodError flatten() deprecated migration guide
💡 Result:
In Zod 4, the .flatten method on ZodError instances is deprecated. The official migration guide recommends replacing it with the top-level z.treeifyError(error) function. Key changes from the Zod 4 migration guide: - ZodError.format is deprecated; use z.treeifyError instead. - ZodError.flatten is deprecated; use z.treeifyError instead. - ZodError.formErrors is dropped (it was identical to .flatten). However, the error formatting documentation distinguishes use cases: - For nested schemas: Use z.treeifyError(error) to get a nested tree structure mirroring the schema (e.g., { errors: [], properties: { field: { errors: [] } } }). - For flat schemas (most common): Use z.flattenError(error) to get a shallow object with formErrors: string[] (top-level) and fieldErrors: Record<string, string[]>. Example migration: v3: const result = schema.safeParse(data); if (!result.success) { const flattened = result.error.flatten; // { formErrors: [], fieldErrors: { field: [] } } } v4: import * as z from 'zod'; const result = schema.safeParse(data); if (!result.success) { // For flat errors (replacement for .flatten): const flattened = z.flattenError(result.error); // { formErrors: string[], fieldErrors: Record<string, string[]> } // Or for nested/tree structure: const tree = z.treeifyError(result.error); // Nested object } Note: z.flattenError provides the direct functional equivalent for flat error handling (common in forms), while z.treeifyError is the general replacement emphasized in the migration guide for deprecated instance methods. Both are top-level functions, not on the error instance. Official docs: - Migration: https://zod.dev/v4/changelog (or v4.zod.dev/v4/changelog) - Formatting: https://zod.dev/error-formatting
Citations:
- 1: https://zod.dev/v4/changelog
- 2: https://www.zod.dev/v4/changelog?id=deprecates-flatten
- 3: https://v4.zod.dev/v4/changelog?id=deprecates-flatten
- 4: https://zod.dev/error-formatting
- 5: https://v4.zod.dev/error-formatting?id=zflattenerror
- 6: https://zod.dev/error-formatting?id=zflattenerror
Replace deprecated ZodError.flatten() with z.flattenError().
In Zod 4, .flatten() on ZodError instances is deprecated. For this flat schema, use the top-level z.flattenError(parsed.error) function instead. The same fix applies to api.generate.audio.ts and api.transcribe.ts.
♻️ Suggested fix
const parsed = SPEECH_BODY_SCHEMA.safeParse(rawData)
if (!parsed.success) {
return jsonError(400, {
error: 'validation_failed',
message: 'Request data failed validation',
- details: parsed.error.flatten(),
+ details: z.flattenError(parsed.error),
})
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const parsed = SPEECH_BODY_SCHEMA.safeParse(rawData) | |
| if (!parsed.success) { | |
| return jsonError(400, { | |
| error: 'validation_failed', | |
| message: 'Request data failed validation', | |
| details: parsed.error.flatten(), | |
| }) | |
| } | |
| const parsed = SPEECH_BODY_SCHEMA.safeParse(rawData) | |
| if (!parsed.success) { | |
| return jsonError(400, { | |
| error: 'validation_failed', | |
| message: 'Request data failed validation', | |
| details: z.flattenError(parsed.error), | |
| }) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/ts-react-chat/src/routes/api.generate.speech.ts` around lines 44 -
51, The code uses parsed.error.flatten(), which is deprecated; replace that call
with the top-level z.flattenError(parsed.error) and pass its result into
jsonError instead. Update the validation block around
SPEECH_BODY_SCHEMA.safeParse (the parsed variable) so that when parsed.success
is false you call z.flattenError(parsed.error) and include that flattened result
in the details field passed to jsonError (also update the same pattern in the
handlers for audio and transcribe).
| protected config: TranscriptionAdapterConfig | ||
|
|
||
| constructor(config: TranscriptionAdapterConfig = {}, model: TModel) { | ||
| constructor(model: TModel, config: TranscriptionAdapterConfig = {}) { |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all subclasses of the three reordered base adapters and inspect their super() calls.
rg -nP --type=ts -C2 'extends\s+Base(Transcription|Image|TTS)Adapter\b'
echo '---'
rg -nP --type=ts -C1 '\bsuper\s*\(' -g 'packages/typescript/ai-*/src/adapters/**'Repository: TanStack/ai
Length of output: 12187
Breaking change to public base-class constructors — requires fixing all in-tree subclasses.
BaseTranscriptionAdapter, BaseImageAdapter, and BaseTTSAdapter are exported from @tanstack/ai/adapters. The constructor argument reorder from (config, model) to (model, config) silently breaks any third-party adapter extending these base classes and calling the old argument order — this.model receives the config object at runtime when TProviderOptions/TModel are structurally compatible.
In-tree subclasses are inconsistently updated. The following adapters still call super() with the old argument order and must be fixed:
ai-openai/adapters/summarize.ts(line 42): changesuper({}, model)→super(model, {})ai-openai/adapters/video.ts(line 58): changesuper(config, model)→super(model, config)ai-openai/adapters/text.ts(line 120): changesuper({}, model)→super(model, {})ai-openrouter/adapters/summarize.ts(line 52): changesuper({}, model)→super(model, {})ai-openrouter/adapters/text.ts(line 109): changesuper({}, model)→super(model, {})ai-fal/adapters/video.ts(line 73): changesuper({}, model)→super(model, {})ai-ollama/adapters/text.ts(line 132): changesuper({}, model)→super(model, {})ai-groq/adapters/text.ts(line 90): changesuper({}, model)→super(model, {})ai-grok/adapters/summarize.ts(line 45): changesuper({}, model)→super(model, {})ai-grok/adapters/text.ts(line 85): changesuper({}, model)→super(model, {})ai-gemini/adapters/text.ts(line 114): changesuper({}, model)→super(model, {})ai-anthropic/adapters/summarize.ts(line 52): changesuper({}, model)→super(model, {})ai-anthropic/adapters/text.ts(line 128): changesuper({}, model)→super(model, {})
Ensure the changeset entry documents this breaking change for @tanstack/ai using a minor version bump per project convention, with a migration note for any out-of-tree subclasses.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai/src/activities/generateTranscription/adapter.ts` at
line 77, The constructor argument order for
BaseTranscriptionAdapter/BaseImageAdapter/BaseTTSAdapter changed from (config,
model) to (model, config), so update every subclass that still calls super with
the old order to pass the model first and config second (e.g., change super({},
model) → super(model, {}) and super(config, model) → super(model, config)) in
the listed adapters (ai-openai/adapters/summarize.ts, video.ts, text.ts;
ai-openrouter/adapters/summarize.ts, text.ts; ai-fal/adapters/video.ts;
ai-ollama/adapters/text.ts; ai-groq/adapters/text.ts;
ai-grok/adapters/summarize.ts, text.ts; ai-gemini/adapters/text.ts;
ai-anthropic/adapters/summarize.ts, text.ts) and any other in-tree/out-of-tree
subclasses; also add a changeset entry for `@tanstack/ai` with a minor version
bump documenting this breaking constructor change and include a migration note
instructing external adapter authors to swap the constructor argument order to
(model, config).
…speech/transcription adapters Resolves conflicts across five files from merging origin/main (PR TanStack#467 debug logging rollout) into the 328-fal-audio-and-speech-support branch. Keeps the CR fix behaviors from HEAD — devtools *:request:error event emission, async/await (no .then chains), reordered base adapter constructor args, single canonical numberOfImages key, broadened PCM detection that excludes containerized wav subtypes, voice validation with partial voiceConfig fallback, channels threaded into wrapPcmBase64AsWav, no dead `mimeType || 'audio/wav'` fallback — while adopting main's logger.request() / logger.errors() plumbing around the SDK call. Also threads the new logger surface through the branch's new adapters that didn't exist on main: adds `logger: InternalLogger` to AudioGenerationOptions (matching sibling types), `debug?: DebugOption` to AudioActivityOptions, resolves it via resolveDebugOption in runGenerateAudio, and wires logger.request / logger.errors into falAudio, falSpeech, falTranscription, and geminiAudio so they match the pattern runGenerateImage / runGenerateVideo already follow on main.
…on, voice validation, etc.
Applies the reviewer-flagged changes that weren't load-bearing for the merge:
- event-client: AudioRequestCompletedEvent.audio is now a mutually-exclusive
{url; never b64Json} | {b64Json; never url} union so consumers can't read
both fields simultaneously, mirroring the GeneratedAudio contract in core.
- fal utils: extractUrlExtension now strips URL fragments and trailing
slashes, parses via the URL API so a TLD like `.com` isn't mistaken for
an extension, and only inspects the final path segment.
- fal utils: deriveAudioContentType returns `audio/aac` for aac, separated
from the `m4a`/`mp4` → `audio/mp4` case.
- fal speech: prefer URL-derived extension when deriving `format`, and
normalize `mpeg` → `mp3` so the field is a usable file extension.
- gemini audio: drop `negativePrompt` (not accepted by GenerateContentConfig)
and `responseMimeType` (Lyria Clip rejects it, Pro returns MP3 by default)
from the public provider options surface, and document that the generic
`duration` option is ignored by Lyria (Clip is fixed at 30s, Pro takes
duration via the natural-language prompt).
- gemini tts: multiSpeakerVoiceConfig.speakerVoiceConfigs length is now
validated (1 or 2 speakers), partial user-supplied voiceConfig correctly
falls back to the standard voice/'Kore' default, parsePcmMimeType tightens
detection to exclude subtypes containing "wav" so containerized
`audio/wav;codec=pcm` is no longer re-wrapped, and createGeminiSpeech /
createGeminiAudio factory functions now spread config before the explicit
apiKey argument so caller config can't silently override the API key.
- ts-react-chat API routes: replace zod 4's removed `.flatten()` with
`z.treeifyError()` for validation error details.
- ts-react-chat audio route: `toAudioOutput` returns `null` per the
`onResult` hook contract instead of throwing synchronously — failures
are still surfaced via the hook's error state.
- Updates the tests affected by the above behavior changes.
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/typescript/ai-fal/src/adapters/image.ts (1)
109-110:⚠️ Potential issue | 🟡 MinorLet single-image string payloads reach
parseImage.Line 109 filters
data.imageto objects only, so{ image: "https://..." }will never reachparseImageeven though it supports string payloads (line 129-130). The type guard should be removed to letparseImagevalidate the payload.Proposed fix
- else if ('image' in data && data.image && typeof data.image === 'object') { + else if ('image' in data) { images.push(this.parseImage(data.image)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-fal/src/adapters/image.ts` around lines 109 - 110, The guard that only allows object-typed images prevents string image payloads from reaching parseImage; in the branch handling 'image' in data (the else if that currently checks 'image' in data && data.image && typeof data.image === "object"), remove the typeof check so you call this.parseImage(data.image) for any non-null data.image value (allowing strings like "https://..."); keep the existing null/undefined check (data.image) and let parseImage perform validation/conversion.
🧹 Nitpick comments (3)
examples/ts-react-chat/src/routes/api.generate.audio.ts (1)
10-15: Optional: add bounds todurationto reject obviously invalid values early.
z.number().optional()accepts negatives,NaN,0, and absurdly large values, which will fail deeper inside the provider with less friendly messages. Since this is an example route, a simple positive-and-finite check would surface bad inputs at the validation boundary:♻️ Proposed fix
- duration: z.number().optional(), + duration: z.number().positive().finite().optional(),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/routes/api.generate.audio.ts` around lines 10 - 15, The duration field in AUDIO_BODY_SCHEMA currently uses z.number().optional() which allows negatives, 0, NaN, and huge values; update the schema so duration enforces positive, finite numbers (for example replace z.number().optional() with z.number().optional().positive().finite(), and optionally add a sensible .max(...) if you want an upper bound) so invalid durations are rejected at validation (see AUDIO_BODY_SCHEMA and the duration property).packages/typescript/ai-openrouter/src/adapters/image.ts (1)
170-170: Minor: consider placingapiKeyafter the spread to match the PR's stated "protect apiKey ordering" intent.
{ apiKey, ...config }is type-safe today becauseconfigisOmit<OpenRouterImageConfig, 'apiKey'>, but the ordering is the opposite of defensive — if the Omit is ever loosened or the call site is bypassed (e.g. viaas any), a strayapiKeyinconfigwould silently override the caller-supplied one. Inverting to{ ...config, apiKey }makes the guarantee runtime-level as well.♻️ Proposed fix
- return new OpenRouterImageAdapter({ apiKey, ...config }, model) + return new OpenRouterImageAdapter({ ...config, apiKey }, model)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-openrouter/src/adapters/image.ts` at line 170, The object spread currently passes apiKey before config in the OpenRouterImageAdapter constructor call, which allows a stray apiKey inside config to override the explicit apiKey at runtime; update the call in the file where OpenRouterImageAdapter is constructed (the return new OpenRouterImageAdapter(..., model) expression) to pass `{ ...config, apiKey }` instead of `{ apiKey, ...config }`, referencing the OpenRouterImageConfig type and the apiKey/config variables so the explicit apiKey always wins at runtime.packages/typescript/ai-fal/src/adapters/audio.ts (1)
111-140: Current implementation correctly handles documented fal response shapes; array handling is optional if needed for future models.The code properly documents and handles the two known fal audio response formats:
audio: { url, content_type }andaudio_url: string. Test coverage confirms both shapes work correctly, and the error thrown when neither is found is safe and clear. While array-shaped responses (e.g.,audio: [...]) would currently fall through and throw, there is no evidence that any fal audio models currently return arrays. Adding array support would only be necessary if future models adopt that response format.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-fal/src/adapters/audio.ts` around lines 111 - 140, Summary: transformResponse correctly handles documented fal shapes but should explicitly document and handle array-shaped audio responses. Update the transformResponse method to keep the current handling for audio: { url, content_type } and audio_url, but add an explicit branch that detects Array.isArray(data.audio) and either handle the first element (e.g., take element[0].url) or throw a clearer error message stating "array-shaped audio responses are not supported" depending on the chosen behavior; reference the local variables audioUrl and contentType, and ensure the returned object still uses this.generateId(), this.model, and deriveAudioContentType(contentType, audioUrl). Also add a short inline comment explaining why arrays are treated specially so future maintainers know the decision.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/ts-react-chat/src/routes/generations.audio.tsx`:
- Around line 1-12: Reorder and convert imports to satisfy import/order and
consistent type-only style: group external libs first (react hooks, `@tanstack/`*
packages), then local imports; use "import type" for pure type imports like
UseGenerateAudioReturn, AudioGenerationResult, AudioProviderConfig,
AudioProviderId; keep value imports (useGenerateAudio, fetchServerSentEvents,
createFileRoute, generateAudioFn, AUDIO_PROVIDERS) as regular imports; ensure
React named imports (useEffect, useMemo, useRef, useState) remain together and
before local project imports to match the lint rules.
- Around line 137-150: The effect only revokes when result.url becomes falsy, so
when result.url changes from a blob: URL to a remote HTTP(S) URL the previous
blob is never revoked; update the useEffect handling of result?.url (the block
that references lastBlobUrlRef and URL.revokeObjectURL) to also revoke and clear
lastBlobUrlRef.current when result.url exists but does not start with 'blob:'
(i.e., add a branch checking current && !current.startsWith('blob:') &&
lastBlobUrlRef.current, or simplify the else-if to revoke whenever
lastBlobUrlRef.current exists), ensuring lastBlobUrlRef.current is set to null
after revoke.
In `@packages/typescript/ai-fal/src/adapters/speech.ts`:
- Around line 120-137: The returned contentType may be too loose when only
audio_url is present—keep format as the file extension (derived by
extractUrlExtension and normalized like 'mpeg'→'mp3') but map common extensions
to proper MIME types for strict consumers; update the logic around
contentTypeMime/format in the function that builds the return object
(references: extractUrlExtension, rawFormat, format, generateId) so that if
contentTypeMime is missing you look up a small map (e.g., mp3→audio/mpeg,
m4a→audio/mp4, wav→audio/wav, ogg→audio/ogg, webm→audio/webm, etc.) and set
contentType to that mapped MIME, otherwise fall back to `audio/${format}`.
In `@packages/typescript/ai-gemini/src/adapters/image.ts`:
- Around line 238-265: The current check only throws when entries.length > 0,
allowing a completely empty or malformed Imagen response (generatedImages
undefined/null/empty) to return images: []; update the logic in the image
adapter handling (the block that builds entries, images, filterReasons and the
throw for Imagen) so that whenever images.length === 0 you throw the same Error
(not just when entries.length > 0); adjust the thrown message to still include
the joined filterReasons when present and reference the model variable as before
(i.e., ensure the throw happens if no usable images were produced regardless of
generatedImages length).
In `@packages/typescript/ai-gemini/src/adapters/tts.ts`:
- Around line 288-321: parsePcmMimeType currently returns bitsPerSample for any
audio/L<N> but wrapPcmBase64AsWav throws for non‑16 bits, causing generateSpeech
to fail for L24/L32; modify the caller (where generateSpeech invokes
wrapPcmBase64AsWav) to detect bitsPerSample !== 16 and, instead of calling
wrapPcmBase64AsWav, return the original base64 payload and original contentType
(i.e., leave it unwrapped) so unsupported PCM depths don't error the whole
request; keep parsePcmMimeType as-is and only add this conditional fallback
around the wrapPcmBase64AsWav invocation.
In `@packages/typescript/ai-gemini/tests/audio-adapter.test.ts`:
- Around line 35-42: The test "geminiAudio reads the API key from the
environment" mutates process.env.GOOGLE_API_KEY without restoring its prior
value; fix by capturing the original value (e.g. const originalKey =
process.env.GOOGLE_API_KEY) before setting it, then in the finally block restore
it (if originalKey is undefined delete process.env.GOOGLE_API_KEY else set
process.env.GOOGLE_API_KEY = originalKey). Apply this change around the
geminiAudio test to avoid affecting other tests in the worker.
In `@packages/typescript/ai/src/activities/generateAudio/index.ts`:
- Around line 182-193: The catch block should normalize non-Error throws before
emitting via aiEventClient.emit to avoid runtime crashes when adapters throw
null/undefined; replace the current naive cast (error as Error) with a
normalizedError computed from the caught value (use the Error instance if error
instanceof Error, otherwise create a new Error(String(error)) and set a fallback
name or include the original value in a raw/cause field), then use that
normalizedError's message and name when emitting the 'audio:request:error' event
(still include requestId, provider: adapter.name, model, duration: elapsedMs,
and modelOptions: rest.modelOptions).
In `@packages/typescript/ai/src/activities/generateSpeech/index.ts`:
- Around line 196-207: The catch block that emits the 'speech:request:error'
event currently does a blind type assertion (error as Error) which will throw if
a non-Error (null/undefined/primitive) was thrown; update the handler in the
generateSpeech catch to normalize the thrown value into a safeError object
before calling aiEventClient.emit — e.g., derive message and name using a
helper/local logic that handles Error instances, non-Error objects,
null/undefined, and primitives (falling back to String(error) and a default name
like 'NonErrorThrow'), and then pass that safeError (with message and name)
along with existing fields (requestId, provider: adapter.name, model,
modelOptions: rest.modelOptions, duration, timestamp) so the emit never throws
when formatting the error.
---
Outside diff comments:
In `@packages/typescript/ai-fal/src/adapters/image.ts`:
- Around line 109-110: The guard that only allows object-typed images prevents
string image payloads from reaching parseImage; in the branch handling 'image'
in data (the else if that currently checks 'image' in data && data.image &&
typeof data.image === "object"), remove the typeof check so you call
this.parseImage(data.image) for any non-null data.image value (allowing strings
like "https://..."); keep the existing null/undefined check (data.image) and let
parseImage perform validation/conversion.
---
Nitpick comments:
In `@examples/ts-react-chat/src/routes/api.generate.audio.ts`:
- Around line 10-15: The duration field in AUDIO_BODY_SCHEMA currently uses
z.number().optional() which allows negatives, 0, NaN, and huge values; update
the schema so duration enforces positive, finite numbers (for example replace
z.number().optional() with z.number().optional().positive().finite(), and
optionally add a sensible .max(...) if you want an upper bound) so invalid
durations are rejected at validation (see AUDIO_BODY_SCHEMA and the duration
property).
In `@packages/typescript/ai-fal/src/adapters/audio.ts`:
- Around line 111-140: Summary: transformResponse correctly handles documented
fal shapes but should explicitly document and handle array-shaped audio
responses. Update the transformResponse method to keep the current handling for
audio: { url, content_type } and audio_url, but add an explicit branch that
detects Array.isArray(data.audio) and either handle the first element (e.g.,
take element[0].url) or throw a clearer error message stating "array-shaped
audio responses are not supported" depending on the chosen behavior; reference
the local variables audioUrl and contentType, and ensure the returned object
still uses this.generateId(), this.model, and
deriveAudioContentType(contentType, audioUrl). Also add a short inline comment
explaining why arrays are treated specially so future maintainers know the
decision.
In `@packages/typescript/ai-openrouter/src/adapters/image.ts`:
- Line 170: The object spread currently passes apiKey before config in the
OpenRouterImageAdapter constructor call, which allows a stray apiKey inside
config to override the explicit apiKey at runtime; update the call in the file
where OpenRouterImageAdapter is constructed (the return new
OpenRouterImageAdapter(..., model) expression) to pass `{ ...config, apiKey }`
instead of `{ apiKey, ...config }`, referencing the OpenRouterImageConfig type
and the apiKey/config variables so the explicit apiKey always wins at runtime.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3593383a-5920-4b45-8f02-0b46581be904
📒 Files selected for processing (31)
docs/config.jsonexamples/ts-react-chat/src/routes/api.generate.audio.tsexamples/ts-react-chat/src/routes/api.generate.speech.tsexamples/ts-react-chat/src/routes/api.transcribe.tsexamples/ts-react-chat/src/routes/generations.audio.tsxpackages/typescript/ai-event-client/src/index.tspackages/typescript/ai-fal/package.jsonpackages/typescript/ai-fal/src/adapters/audio.tspackages/typescript/ai-fal/src/adapters/image.tspackages/typescript/ai-fal/src/adapters/speech.tspackages/typescript/ai-fal/src/adapters/transcription.tspackages/typescript/ai-fal/src/utils/client.tspackages/typescript/ai-fal/tests/speech-adapter.test.tspackages/typescript/ai-fal/tests/utils.test.tspackages/typescript/ai-gemini/src/adapters/audio.tspackages/typescript/ai-gemini/src/adapters/image.tspackages/typescript/ai-gemini/src/adapters/tts.tspackages/typescript/ai-gemini/tests/audio-adapter.test.tspackages/typescript/ai-gemini/tests/image-adapter.test.tspackages/typescript/ai-gemini/tests/tts-adapter.test.tspackages/typescript/ai-grok/src/adapters/image.tspackages/typescript/ai-openai/src/adapters/image.tspackages/typescript/ai-openai/src/adapters/transcription.tspackages/typescript/ai-openai/src/adapters/tts.tspackages/typescript/ai-openrouter/src/adapters/image.tspackages/typescript/ai-openrouter/tests/image-adapter.test.tspackages/typescript/ai/src/activities/generateAudio/index.tspackages/typescript/ai/src/activities/generateSpeech/index.tspackages/typescript/ai/src/activities/generateTranscription/index.tspackages/typescript/ai/src/index.tspackages/typescript/ai/src/types.ts
✅ Files skipped from review due to trivial changes (4)
- docs/config.json
- packages/typescript/ai-fal/package.json
- packages/typescript/ai-openrouter/tests/image-adapter.test.ts
- packages/typescript/ai-fal/tests/speech-adapter.test.ts
🚧 Files skipped from review as they are similar to previous changes (8)
- packages/typescript/ai-openai/src/adapters/tts.ts
- packages/typescript/ai/src/index.ts
- packages/typescript/ai-openai/src/adapters/image.ts
- packages/typescript/ai-grok/src/adapters/image.ts
- packages/typescript/ai-openai/src/adapters/transcription.ts
- packages/typescript/ai-fal/src/adapters/transcription.ts
- packages/typescript/ai-gemini/tests/tts-adapter.test.ts
- packages/typescript/ai-gemini/src/adapters/audio.ts
| import { useEffect, useMemo, useRef, useState } from 'react' | ||
| import { createFileRoute } from '@tanstack/react-router' | ||
| import { useGenerateAudio } from '@tanstack/ai-react' | ||
| import type { UseGenerateAudioReturn } from '@tanstack/ai-react' | ||
| import { fetchServerSentEvents } from '@tanstack/ai-client' | ||
| import type { AudioGenerationResult } from '@tanstack/ai' | ||
| import { generateAudioFn } from '../lib/server-fns' | ||
| import { | ||
| AUDIO_PROVIDERS, | ||
| type AudioProviderConfig, | ||
| type AudioProviderId, | ||
| } from '../lib/audio-providers' |
There was a problem hiding this comment.
Fix the lint-required import ordering and type-only imports.
Static analysis reports import/order and import/consistent-type-specifier-style errors here.
🧹 Proposed fix
import { useEffect, useMemo, useRef, useState } from 'react'
import { createFileRoute } from '@tanstack/react-router'
import { useGenerateAudio } from '@tanstack/ai-react'
-import type { UseGenerateAudioReturn } from '@tanstack/ai-react'
import { fetchServerSentEvents } from '@tanstack/ai-client'
-import type { AudioGenerationResult } from '@tanstack/ai'
import { generateAudioFn } from '../lib/server-fns'
-import {
- AUDIO_PROVIDERS,
- type AudioProviderConfig,
- type AudioProviderId,
-} from '../lib/audio-providers'
+import { AUDIO_PROVIDERS } from '../lib/audio-providers'
+import type { AudioGenerationResult } from '@tanstack/ai'
+import type { UseGenerateAudioReturn } from '@tanstack/ai-react'
+import type {
+ AudioProviderConfig,
+ AudioProviderId,
+} from '../lib/audio-providers'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import { useEffect, useMemo, useRef, useState } from 'react' | |
| import { createFileRoute } from '@tanstack/react-router' | |
| import { useGenerateAudio } from '@tanstack/ai-react' | |
| import type { UseGenerateAudioReturn } from '@tanstack/ai-react' | |
| import { fetchServerSentEvents } from '@tanstack/ai-client' | |
| import type { AudioGenerationResult } from '@tanstack/ai' | |
| import { generateAudioFn } from '../lib/server-fns' | |
| import { | |
| AUDIO_PROVIDERS, | |
| type AudioProviderConfig, | |
| type AudioProviderId, | |
| } from '../lib/audio-providers' | |
| import { useEffect, useMemo, useRef, useState } from 'react' | |
| import { createFileRoute } from '@tanstack/react-router' | |
| import { useGenerateAudio } from '@tanstack/ai-react' | |
| import { fetchServerSentEvents } from '@tanstack/ai-client' | |
| import { generateAudioFn } from '../lib/server-fns' | |
| import { AUDIO_PROVIDERS } from '../lib/audio-providers' | |
| import type { AudioGenerationResult } from '@tanstack/ai' | |
| import type { UseGenerateAudioReturn } from '@tanstack/ai-react' | |
| import type { | |
| AudioProviderConfig, | |
| AudioProviderId, | |
| } from '../lib/audio-providers' |
🧰 Tools
🪛 ESLint
[error] 4-4: @tanstack/ai-react type import should occur after import of ../lib/audio-providers
(import/order)
[error] 6-6: @tanstack/ai type import should occur after import of ../lib/audio-providers
(import/order)
[error] 10-10: Prefer using a top-level type-only import instead of inline type specifiers.
(import/consistent-type-specifier-style)
[error] 11-11: Prefer using a top-level type-only import instead of inline type specifiers.
(import/consistent-type-specifier-style)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/ts-react-chat/src/routes/generations.audio.tsx` around lines 1 - 12,
Reorder and convert imports to satisfy import/order and consistent type-only
style: group external libs first (react hooks, `@tanstack/`* packages), then local
imports; use "import type" for pure type imports like UseGenerateAudioReturn,
AudioGenerationResult, AudioProviderConfig, AudioProviderId; keep value imports
(useGenerateAudio, fetchServerSentEvents, createFileRoute, generateAudioFn,
AUDIO_PROVIDERS) as regular imports; ensure React named imports (useEffect,
useMemo, useRef, useState) remain together and before local project imports to
match the lint rules.
| useEffect(() => { | ||
| const current = result?.url | ||
| // Only track blob: URLs — remote URLs returned directly by providers | ||
| // are not ours to revoke. | ||
| if (current && current.startsWith('blob:')) { | ||
| if (lastBlobUrlRef.current && lastBlobUrlRef.current !== current) { | ||
| URL.revokeObjectURL(lastBlobUrlRef.current) | ||
| } | ||
| lastBlobUrlRef.current = current | ||
| } else if (!current && lastBlobUrlRef.current) { | ||
| URL.revokeObjectURL(lastBlobUrlRef.current) | ||
| lastBlobUrlRef.current = null | ||
| } | ||
| }, [result?.url]) |
There was a problem hiding this comment.
Revoke blob URLs when the next result is remote.
The current effect skips revocation when result.url changes from a blob: URL to an HTTP(S) URL, retaining the previous audio blob for the component lifetime.
🧯 Proposed fix
useEffect(() => {
const current = result?.url
- // Only track blob: URLs — remote URLs returned directly by providers
- // are not ours to revoke.
- if (current && current.startsWith('blob:')) {
- if (lastBlobUrlRef.current && lastBlobUrlRef.current !== current) {
- URL.revokeObjectURL(lastBlobUrlRef.current)
- }
+ const previous = lastBlobUrlRef.current
+
+ if (previous && previous !== current) {
+ URL.revokeObjectURL(previous)
+ lastBlobUrlRef.current = null
+ }
+
+ // Only track blob: URLs — remote URLs returned directly by providers
+ // are not ours to revoke.
+ if (current?.startsWith('blob:')) {
lastBlobUrlRef.current = current
- } else if (!current && lastBlobUrlRef.current) {
- URL.revokeObjectURL(lastBlobUrlRef.current)
- lastBlobUrlRef.current = null
}
}, [result?.url])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| useEffect(() => { | |
| const current = result?.url | |
| // Only track blob: URLs — remote URLs returned directly by providers | |
| // are not ours to revoke. | |
| if (current && current.startsWith('blob:')) { | |
| if (lastBlobUrlRef.current && lastBlobUrlRef.current !== current) { | |
| URL.revokeObjectURL(lastBlobUrlRef.current) | |
| } | |
| lastBlobUrlRef.current = current | |
| } else if (!current && lastBlobUrlRef.current) { | |
| URL.revokeObjectURL(lastBlobUrlRef.current) | |
| lastBlobUrlRef.current = null | |
| } | |
| }, [result?.url]) | |
| useEffect(() => { | |
| const current = result?.url | |
| const previous = lastBlobUrlRef.current | |
| if (previous && previous !== current) { | |
| URL.revokeObjectURL(previous) | |
| lastBlobUrlRef.current = null | |
| } | |
| // Only track blob: URLs — remote URLs returned directly by providers | |
| // are not ours to revoke. | |
| if (current?.startsWith('blob:')) { | |
| lastBlobUrlRef.current = current | |
| } | |
| }, [result?.url]) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/ts-react-chat/src/routes/generations.audio.tsx` around lines 137 -
150, The effect only revokes when result.url becomes falsy, so when result.url
changes from a blob: URL to a remote HTTP(S) URL the previous blob is never
revoked; update the useEffect handling of result?.url (the block that references
lastBlobUrlRef and URL.revokeObjectURL) to also revoke and clear
lastBlobUrlRef.current when result.url exists but does not start with 'blob:'
(i.e., add a branch checking current && !current.startsWith('blob:') &&
lastBlobUrlRef.current, or simplify the else-if to revoke whenever
lastBlobUrlRef.current exists), ensuring lastBlobUrlRef.current is set to null
after revoke.
| // Strip parameters like `; charset=...` from contentType, and only use | ||
| // the URL extension as a fallback when it looks like a real extension. | ||
| const contentTypeMime = contentType?.split(';')[0]?.trim() | ||
| const safeUrlExtension = extractUrlExtension(audioUrl) | ||
| // Prefer URL-derived extension when available (more canonical for file | ||
| // consumers), otherwise derive from the content-type mime subtype, then | ||
| // fall back to `wav`. Normalize `mpeg` → `mp3` so the format field is a | ||
| // usable file extension rather than the IANA subtype. | ||
| const rawFormat = | ||
| safeUrlExtension || contentTypeMime?.split('/')[1] || 'wav' | ||
| const format = rawFormat === 'mpeg' ? 'mp3' : rawFormat | ||
|
|
||
| return { | ||
| id: response.requestId || this.generateId(), | ||
| model: this.model, | ||
| audio: base64, | ||
| format, | ||
| contentType: contentTypeMime || `audio/${format}`, |
There was a problem hiding this comment.
Map extension fallbacks to MIME types.
When fal returns only audio_url, contentType can become values like audio/mp3 or audio/m4a. Keep format as the file extension, but map common extensions to their MIME types for stricter consumers.
🎧 Proposed fix
const rawFormat =
safeUrlExtension || contentTypeMime?.split('/')[1] || 'wav'
const format = rawFormat === 'mpeg' ? 'mp3' : rawFormat
+ const contentTypeByFormat: Record<string, string> = {
+ mp3: 'audio/mpeg',
+ m4a: 'audio/mp4',
+ mp4: 'audio/mp4',
+ wav: 'audio/wav',
+ }
return {
id: response.requestId || this.generateId(),
model: this.model,
audio: base64,
format,
- contentType: contentTypeMime || `audio/${format}`,
+ contentType:
+ contentTypeMime || contentTypeByFormat[format] || `audio/${format}`,
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-fal/src/adapters/speech.ts` around lines 120 - 137,
The returned contentType may be too loose when only audio_url is present—keep
format as the file extension (derived by extractUrlExtension and normalized like
'mpeg'→'mp3') but map common extensions to proper MIME types for strict
consumers; update the logic around contentTypeMime/format in the function that
builds the return object (references: extractUrlExtension, rawFormat, format,
generateId) so that if contentTypeMime is missing you look up a small map (e.g.,
mp3→audio/mpeg, m4a→audio/mp4, wav→audio/wav, ogg→audio/ogg, webm→audio/webm,
etc.) and set contentType to that mapped MIME, otherwise fall back to
`audio/${format}`.
| const entries = response.generatedImages ?? [] | ||
| const images: Array<GeneratedImage> = [] | ||
| const filterReasons: Array<string> = [] | ||
|
|
||
| for (const item of entries) { | ||
| const b64Json = item.image?.imageBytes | ||
| if (b64Json) { | ||
| images.push({ b64Json, revisedPrompt: item.enhancedPrompt }) | ||
| continue | ||
| } | ||
| // Imagen can drop individual entries with a raiFilteredReason when | ||
| // Responsible-AI filters fire. Preserve the reason so callers can | ||
| // surface it instead of silently getting back fewer images. | ||
| const reason = (item as { raiFilteredReason?: string }).raiFilteredReason | ||
| if (reason) { | ||
| filterReasons.push(reason) | ||
| } | ||
| } | ||
|
|
||
| // Every entry was filtered — no usable images to return. Throw rather | ||
| // than resolve to an empty array so the caller is forced to handle the | ||
| // failure mode explicitly. | ||
| if (entries.length > 0 && images.length === 0) { | ||
| const joined = filterReasons.length > 0 ? filterReasons.join('; ') : '' | ||
| throw new Error( | ||
| `Imagen ${model} returned no images: all ${entries.length} generated image(s) were filtered by Responsible-AI${joined ? ` (${joined})` : ''}.`, | ||
| ) | ||
| } |
There was a problem hiding this comment.
Also reject completely empty Imagen responses.
Line 260 only throws when entries.length > 0, so a malformed/empty generatedImages response still resolves with images: []. That recreates the silent-success failure mode fixed for Gemini-native responses.
🐛 Proposed fix
- if (entries.length > 0 && images.length === 0) {
+ if (images.length === 0) {
+ if (entries.length === 0) {
+ throw new Error(`Imagen ${model} returned no images.`)
+ }
+
const joined = filterReasons.length > 0 ? filterReasons.join('; ') : ''
throw new Error(
`Imagen ${model} returned no images: all ${entries.length} generated image(s) were filtered by Responsible-AI${joined ? ` (${joined})` : ''}.`,
)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-gemini/src/adapters/image.ts` around lines 238 - 265,
The current check only throws when entries.length > 0, allowing a completely
empty or malformed Imagen response (generatedImages undefined/null/empty) to
return images: []; update the logic in the image adapter handling (the block
that builds entries, images, filterReasons and the throw for Imagen) so that
whenever images.length === 0 you throw the same Error (not just when
entries.length > 0); adjust the thrown message to still include the joined
filterReasons when present and reference the model variable as before (i.e.,
ensure the throw happens if no usable images were produced regardless of
generatedImages length).
| const bitDepthMatch = /^audio\/l(\d+)/.exec(normalized) | ||
| const isPcm = | ||
| bitDepthMatch !== null || | ||
| normalized.startsWith('audio/pcm') || | ||
| normalized.startsWith('audio/x-pcm') || | ||
| normalized.includes('codec=pcm') | ||
| if (!isPcm) return undefined | ||
|
|
||
| const rateMatch = /rate=(\d+)/.exec(normalized) | ||
| const channelsMatch = /channels=(\d+)/.exec(normalized) | ||
| // Default to 16-bit when the mime type doesn't specify — matches Gemini's | ||
| // audio/L16;codec=pcm;rate=24000 response. | ||
| const bitsPerSample = bitDepthMatch ? Number(bitDepthMatch[1]) : 16 | ||
| return { | ||
| sampleRate: rateMatch ? Number(rateMatch[1]) : 24000, | ||
| channels: channelsMatch ? Number(channelsMatch[1]) : 1, | ||
| bitsPerSample, | ||
| } | ||
| } | ||
|
|
||
| function wrapPcmBase64AsWav( | ||
| pcmBase64: string, | ||
| sampleRate: number, | ||
| channels = 1, | ||
| bitsPerSample = 16, | ||
| ): string { | ||
| // The WAV writer below emits a 16-bit PCM fmt chunk. If the source claims a | ||
| // different bit depth we'd be lying about the payload, so bail out loudly | ||
| // rather than producing a corrupt file. | ||
| if (bitsPerSample !== 16) { | ||
| throw new Error( | ||
| `Unsupported PCM bit depth ${bitsPerSample}: only 16-bit PCM can be wrapped as WAV.`, | ||
| ) | ||
| } |
There was a problem hiding this comment.
Non‑16‑bit PCM mime types now hard‑fail the request.
parsePcmMimeType accepts any audio/L<N> (e.g. audio/L24, audio/L32) and returns bitsPerSample = N, but wrapPcmBase64AsWav throws Unsupported PCM bit depth … for anything other than 16. That error propagates out of generateSpeech, so if Gemini (or a future model) ever returns non‑16‑bit PCM the entire call errors instead of returning the raw bytes. Today Lyria/TTS return L16 so this is latent, but it's a trivially avoidable footgun.
Prefer one of:
- Restrict
parsePcmMimeTypeto 16‑bit PCM so unsupported depths fall through to the genericmimeType.split('/')path and are returned as‑is, or - Catch the unsupported‑depth case at the call site and return the un‑wrapped base64 with the original
contentType.
🛡️ Proposed fix (option 1 — tightens the detector)
- // Default to 16-bit when the mime type doesn't specify — matches Gemini's
- // audio/L16;codec=pcm;rate=24000 response.
- const bitsPerSample = bitDepthMatch ? Number(bitDepthMatch[1]) : 16
+ // Default to 16-bit when the mime type doesn't specify — matches Gemini's
+ // audio/L16;codec=pcm;rate=24000 response. Only 16-bit PCM can be wrapped
+ // as WAV by the helper below, so bail out for other bit depths and let the
+ // caller fall through to returning the raw payload with its original mime.
+ const bitsPerSample = bitDepthMatch ? Number(bitDepthMatch[1]) : 16
+ if (bitsPerSample !== 16) return undefined
return {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-gemini/src/adapters/tts.ts` around lines 288 - 321,
parsePcmMimeType currently returns bitsPerSample for any audio/L<N> but
wrapPcmBase64AsWav throws for non‑16 bits, causing generateSpeech to fail for
L24/L32; modify the caller (where generateSpeech invokes wrapPcmBase64AsWav) to
detect bitsPerSample !== 16 and, instead of calling wrapPcmBase64AsWav, return
the original base64 payload and original contentType (i.e., leave it unwrapped)
so unsupported PCM depths don't error the whole request; keep parsePcmMimeType
as-is and only add this conditional fallback around the wrapPcmBase64AsWav
invocation.
| it('geminiAudio reads the API key from the environment', () => { | ||
| process.env.GOOGLE_API_KEY = 'env-key' | ||
| try { | ||
| const adapter = geminiAudio('lyria-3-clip-preview') | ||
| expect(adapter.model).toBe('lyria-3-clip-preview') | ||
| } finally { | ||
| delete process.env.GOOGLE_API_KEY | ||
| } |
There was a problem hiding this comment.
Restore the previous GOOGLE_API_KEY value.
This test deletes the env var even when it existed before the test, which can affect later tests in the same worker.
🧪 Proposed fix
it('geminiAudio reads the API key from the environment', () => {
+ const previousGoogleApiKey = process.env.GOOGLE_API_KEY
process.env.GOOGLE_API_KEY = 'env-key'
try {
const adapter = geminiAudio('lyria-3-clip-preview')
expect(adapter.model).toBe('lyria-3-clip-preview')
} finally {
- delete process.env.GOOGLE_API_KEY
+ if (previousGoogleApiKey === undefined) {
+ delete process.env.GOOGLE_API_KEY
+ } else {
+ process.env.GOOGLE_API_KEY = previousGoogleApiKey
+ }
}
})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it('geminiAudio reads the API key from the environment', () => { | |
| process.env.GOOGLE_API_KEY = 'env-key' | |
| try { | |
| const adapter = geminiAudio('lyria-3-clip-preview') | |
| expect(adapter.model).toBe('lyria-3-clip-preview') | |
| } finally { | |
| delete process.env.GOOGLE_API_KEY | |
| } | |
| it('geminiAudio reads the API key from the environment', () => { | |
| const previousGoogleApiKey = process.env.GOOGLE_API_KEY | |
| process.env.GOOGLE_API_KEY = 'env-key' | |
| try { | |
| const adapter = geminiAudio('lyria-3-clip-preview') | |
| expect(adapter.model).toBe('lyria-3-clip-preview') | |
| } finally { | |
| if (previousGoogleApiKey === undefined) { | |
| delete process.env.GOOGLE_API_KEY | |
| } else { | |
| process.env.GOOGLE_API_KEY = previousGoogleApiKey | |
| } | |
| } | |
| }) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-gemini/tests/audio-adapter.test.ts` around lines 35 -
42, The test "geminiAudio reads the API key from the environment" mutates
process.env.GOOGLE_API_KEY without restoring its prior value; fix by capturing
the original value (e.g. const originalKey = process.env.GOOGLE_API_KEY) before
setting it, then in the finally block restore it (if originalKey is undefined
delete process.env.GOOGLE_API_KEY else set process.env.GOOGLE_API_KEY =
originalKey). Apply this change around the geminiAudio test to avoid affecting
other tests in the worker.
| } catch (error) { | ||
| const elapsedMs = Date.now() - startTime | ||
| const err = error as Error | ||
| aiEventClient.emit('audio:request:error', { | ||
| requestId, | ||
| provider: adapter.name, | ||
| model, | ||
| error: { message: err.message, name: err.name }, | ||
| duration: elapsedMs, | ||
| modelOptions: rest.modelOptions as Record<string, unknown> | undefined, | ||
| timestamp: Date.now(), | ||
| }) |
There was a problem hiding this comment.
Normalize non-Error throws before emitting error events.
error as Error does not guard at runtime; a null/undefined throw from an adapter would throw again at err.message and replace the original failure.
🐛 Proposed fix
const elapsedMs = Date.now() - startTime
- const err = error as Error
+ const err =
+ error instanceof Error
+ ? { message: error.message, name: error.name }
+ : { message: String(error) }
aiEventClient.emit('audio:request:error', {
requestId,
provider: adapter.name,
model,
- error: { message: err.message, name: err.name },
+ error: err,
duration: elapsedMs,
modelOptions: rest.modelOptions as Record<string, unknown> | undefined,
timestamp: Date.now(),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } catch (error) { | |
| const elapsedMs = Date.now() - startTime | |
| const err = error as Error | |
| aiEventClient.emit('audio:request:error', { | |
| requestId, | |
| provider: adapter.name, | |
| model, | |
| error: { message: err.message, name: err.name }, | |
| duration: elapsedMs, | |
| modelOptions: rest.modelOptions as Record<string, unknown> | undefined, | |
| timestamp: Date.now(), | |
| }) | |
| } catch (error) { | |
| const elapsedMs = Date.now() - startTime | |
| const err = | |
| error instanceof Error | |
| ? { message: error.message, name: error.name } | |
| : { message: String(error) } | |
| aiEventClient.emit('audio:request:error', { | |
| requestId, | |
| provider: adapter.name, | |
| model, | |
| error: err, | |
| duration: elapsedMs, | |
| modelOptions: rest.modelOptions as Record<string, unknown> | undefined, | |
| timestamp: Date.now(), | |
| }) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai/src/activities/generateAudio/index.ts` around lines
182 - 193, The catch block should normalize non-Error throws before emitting via
aiEventClient.emit to avoid runtime crashes when adapters throw null/undefined;
replace the current naive cast (error as Error) with a normalizedError computed
from the caught value (use the Error instance if error instanceof Error,
otherwise create a new Error(String(error)) and set a fallback name or include
the original value in a raw/cause field), then use that normalizedError's
message and name when emitting the 'audio:request:error' event (still include
requestId, provider: adapter.name, model, duration: elapsedMs, and modelOptions:
rest.modelOptions).
| } catch (error) { | ||
| const duration = Date.now() - startTime | ||
| const err = error as Error | ||
| aiEventClient.emit('speech:request:error', { | ||
| requestId, | ||
| provider: adapter.name, | ||
| model, | ||
| error: { message: err.message, name: err.name }, | ||
| duration, | ||
| modelOptions: rest.modelOptions as Record<string, unknown> | undefined, | ||
| timestamp: Date.now(), | ||
| }) |
There was a problem hiding this comment.
Normalize non-Error throws before emitting error events.
error as Error is only a type assertion; if an adapter/SDK throws null or undefined, err.message throws inside the catch block and masks the original failure.
🐛 Proposed fix
- const err = error as Error
+ const err =
+ error instanceof Error
+ ? { message: error.message, name: error.name }
+ : { message: String(error) }
aiEventClient.emit('speech:request:error', {
requestId,
provider: adapter.name,
model,
- error: { message: err.message, name: err.name },
+ error: err,
duration,
modelOptions: rest.modelOptions as Record<string, unknown> | undefined,
timestamp: Date.now(),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } catch (error) { | |
| const duration = Date.now() - startTime | |
| const err = error as Error | |
| aiEventClient.emit('speech:request:error', { | |
| requestId, | |
| provider: adapter.name, | |
| model, | |
| error: { message: err.message, name: err.name }, | |
| duration, | |
| modelOptions: rest.modelOptions as Record<string, unknown> | undefined, | |
| timestamp: Date.now(), | |
| }) | |
| } catch (error) { | |
| const duration = Date.now() - startTime | |
| const err = | |
| error instanceof Error | |
| ? { message: error.message, name: error.name } | |
| : { message: String(error) } | |
| aiEventClient.emit('speech:request:error', { | |
| requestId, | |
| provider: adapter.name, | |
| model, | |
| error: err, | |
| duration, | |
| modelOptions: rest.modelOptions as Record<string, unknown> | undefined, | |
| timestamp: Date.now(), | |
| }) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai/src/activities/generateSpeech/index.ts` around lines
196 - 207, The catch block that emits the 'speech:request:error' event currently
does a blind type assertion (error as Error) which will throw if a non-Error
(null/undefined/primitive) was thrown; update the handler in the generateSpeech
catch to normalize the thrown value into a safeError object before calling
aiEventClient.emit — e.g., derive message and name using a helper/local logic
that handles Error instances, non-Error objects, null/undefined, and primitives
(falling back to String(error) and a default name like 'NonErrorThrow'), and
then pass that safeError (with message and name) along with existing fields
(requestId, provider: adapter.name, model, modelOptions: rest.modelOptions,
duration, timestamp) so the emit never throws when formatting the error.
…ities - debug-logging.md: list generateAudio/generateTranscription in Non-chat activities section; clarify that the `provider` category now applies to streaming generateAudio/generateSpeech/generateTranscription calls too. - audio-generation.md, text-to-speech.md, transcription.md: add a single contextual callout at the moment a builder is most likely to need it (immediately before the Options table / next to Error Handling), pointing to the debug-logging guide.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docs/media/audio-generation.md (1)
110-122: Consider clarifying the mutually-exclusive nature ofurlandb64Json.While the interface shows both fields as optional, the implementation guarantees exactly one will be present (per the PR objectives). Consider adding a note in the prose to make this constraint explicit for users.
📋 Suggested addition after line 122
}
+Note:
urlandb64Jsonare mutually exclusive — exactly one will be present depending on the provider.</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@docs/media/audio-generation.mdaround lines 110 - 122, Clarify in the docs
that the AudioGenerationResult interface guarantees exactly one of audio.url or
audio.b64Json will be returned; update the prose near the AudioGenerationResult
definition to state that url and b64Json are mutually exclusive and that exactly
one will be present depending on the provider (reference AudioGenerationResult
and the audio.url/audio.b64Json fields).</details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Inline comments:
In@docs/media/audio-generation.md:
- Line 102: Update the table row for the
adapter(AudioAdapter) to clarify
it applies to all audio adapters, not justfalAudio()—mentiongeminiAudio()
and other supported adapters (e.g., "The adapter created via falAudio(),
geminiAudio(), or other AudioAdapter factories") so readers understand any
AudioAdapter implementation is valid; update the description text where
adapterandAudioAdapterappear to reflect this broader applicability.
Nitpick comments:
In@docs/media/audio-generation.md:
- Around line 110-122: Clarify in the docs that the AudioGenerationResult
interface guarantees exactly one of audio.url or audio.b64Json will be returned;
update the prose near the AudioGenerationResult definition to state that url and
b64Json are mutually exclusive and that exactly one will be present depending on
the provider (reference AudioGenerationResult and the audio.url/audio.b64Json
fields).</details> <details> <summary>🪄 Autofix (Beta)</summary> Fix all unresolved CodeRabbit comments on this PR: - [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended) - [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes </details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: defaults **Review profile**: CHILL **Plan**: Pro **Run ID**: `baff8e9a-6958-479e-916c-021d661c8ff1` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 8d5fb936682339e0c145af9ea072e0fad204cd8d and f1bdc01c6187477f3ebfbda7abd32ab9dd648882. </details> <details> <summary>📒 Files selected for processing (4)</summary> * `docs/advanced/debug-logging.md` * `docs/media/audio-generation.md` * `docs/media/text-to-speech.md` * `docs/media/transcription.md` </details> <details> <summary>✅ Files skipped from review due to trivial changes (2)</summary> * docs/advanced/debug-logging.md * docs/media/text-to-speech.md </details> <details> <summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary> * docs/media/transcription.md </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
|
|
||
| | Option | Type | Description | | ||
| |--------|------|-------------| | ||
| | `adapter` | `AudioAdapter` | The adapter created via `falAudio()` (required) | |
There was a problem hiding this comment.
Clarify adapter description to include all audio adapters.
The description mentions only falAudio(), but this options table applies to all audio adapters including geminiAudio().
📝 Proposed fix
-| `adapter` | `AudioAdapter` | The adapter created via `falAudio()` (required) |
+| `adapter` | `AudioAdapter` | The audio adapter (e.g., `falAudio()`, `geminiAudio()`) (required) |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | `adapter` | `AudioAdapter` | The adapter created via `falAudio()` (required) | | |
| | `adapter` | `AudioAdapter` | The audio adapter (e.g., `falAudio()`, `geminiAudio()`) (required) | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/media/audio-generation.md` at line 102, Update the table row for the
`adapter` (`AudioAdapter`) to clarify it applies to all audio adapters, not just
`falAudio()`—mention `geminiAudio()` and other supported adapters (e.g., "The
adapter created via falAudio(), geminiAudio(), or other AudioAdapter factories")
so readers understand any AudioAdapter implementation is valid; update the
description text where `adapter` and `AudioAdapter` appear to reflect this
broader applicability.
…eration skill Agents hitting the new generateAudio/generateSpeech/generateTranscription activities will run into: - Gemini Lyria doesn't accept responseMimeType or negativePrompt via GenerateContentConfig — shape the prompt instead. - Lyria 3 Clip is fixed 30s; Lyria 3 Pro reads duration from natural-language in the prompt, not the duration option. fal audio maps duration per-model. - Gemini TTS multiSpeakerVoiceConfig is validated to 1 or 2 speakers. - debug: DebugOption is threaded through every generate*() activity — reach for it instead of writing logging middleware. Adds four Common Mistake entries, sources the debug-logging doc, and cross-references the ai-core/debug-logging sub-skill.
fal-client auto-uploads Blob/File inputs via fal.storage.upload but passes strings through unchanged, so data URLs reached fal's API and got rejected with 422 "Unsupported data URL". Decode data URL strings to a Blob in buildInput so the auto-upload path handles them; plain http(s) URLs still pass through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Summary
Closes #328.
Adds audio generation / speech / transcription adapters for fal and Gemini, a new
generateAudioactivity with streaming support, framework hooks (useGenerateAudio/createGenerateAudio), and example pages ints-react-chat.What's in this PR
Core
@tanstack/aigenerateAudioactivity (music, sound-effects, and the broader fal audio catalog — audio-to-audio, voice conversion / cloning, enhancement, separation, isolation, understanding, merge, etc.) with a matchingAudioAdapterinterface,BaseAudioAdapterbase class, andaudio:request:started/audio:request:completed/audio:usagedevtools events.generateAudio()now supportsstream: true, returning anAsyncIterable<StreamChunk>that can be piped throughtoServerSentEventsResponse()— matching the other media activities.generateSpeech/generateTranscriptionactivity generics tightened toTTSProviderOptions<TAdapter>/TranscriptionProviderOptions<TAdapter>so typed provider options flow through.GeneratedImageandGeneratedAudionow enforce exactly one ofurlorb64Jsonvia a mutually-exclusiveGeneratedMediaSourceunion (seee109e962). Existingimg.url || ``data:...;base64,${img.b64Json}``` read patterns still work;@tanstack/ai-openrouterand@tanstack/ai-falimage adapters no longer synthesize adata:` URL when the provider returned base64.Framework hooks
@tanstack/ai-client: newAudioGenerateInputtype.@tanstack/ai-react,@tanstack/ai-solid,@tanstack/ai-vue: newuseGenerateAudiohook.@tanstack/ai-svelte: newcreateGenerateAudio.{ generate, result, isLoading, error, status, stop, reset }— and support bothconnection(SSE) andfetchertransports.Provider adapters
@tanstack/ai-gemini:geminiAudio()adapter for Lyria 3 Pro / Lyria 3 Clip music generation (MP3/WAV 48 kHz stereo; clip variant returns 30-second MP3s).gemini-3.1-flash-tts-previewto the TTS model list (70+ languages, 200+ audio tags).multiSpeakerVoiceConfigtoGeminiTTSProviderOptionsfor 2-speaker dialogue generation.@tanstack/ai-fal: three new tree-shakeable adapters alongside the existingfalImage()/falVideo():falSpeech()— TTS via models likefal-ai/gemini-3.1-flash-tts,fal-ai/minimax/speech-2.6-hd,fal-ai/kokoro/*.falTranscription()— STT viafal-ai/whisper,fal-ai/wizper,fal-ai/speech-to-text/turbo.falAudio()— music and SFX viafal-ai/minimax-music/v2.6,fal-ai/diffrhythm,fal-ai/lyria2,fal-ai/stable-audio-25/text-to-audio,fal-ai/elevenlabs/sound-effects/v2, plus the wider audio catalog.durationtranslation fix (8b4cb006) so each model's expected duration parameter is forwarded correctly.Docs
media/audio-generation.mdguide — leads with Gemini (Lyria), then fal.media/text-to-speech.mdandmedia/transcription.mdgain fal sections.docs/adapters/fal.mdexpanded with TTS, transcription, and audio sections plus a full model table.docs/adapters/gemini.mdupdated for Lyria and Gemini 3.1 Flash TTS.Examples (
examples/ts-react-chat)/generations/audiopage covering Gemini Lyria and fal audio generation with a model selector.cc273a59).Tests
useGenerateAudio/createGenerateAudioacross all framework packages.History notes
9998a093) and have since merged onmain. This PR no longer touches@tanstack/ai-elevenlabssource.generateAudiostayed a single activity — briefly split intogenerateMusic+generateSoundEffects(1010e9b7), then reverted in247ef7baonce fal's full audio catalog made the music/SFX binary clearly too narrow (dozens of audio-to-audio, voice, enhancement, separation, isolation, understanding, merge models span both or neither category).b00b28ac) — aimock 1.14 has nogenerateContentAUDIO-modality mock for Lyria, so the harness pieces were removed rather than left dormant. They will land with a follow-up once upstream aimock support exists.Test plan
pnpm test:lib(affected packages pass)pnpm test:types(affected packages pass)pnpm test:eslint(affected packages pass)pnpm test:docs(no broken links)generateAudio— deferred pending upstream aimock Lyria / AUDIO-modality supportts-react-chatdev server and verify tab switching + audio playback per provider (OpenAI / Gemini / Fal for TTS; OpenAI / Fal for transcription; Gemini Lyria / Fal Audio for/generations/audio)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Documentation