Skip to content

Persistent APIError: 1007 None (Invalid Audio Format) in google-adk 1.31.1 using Vertex AI Live API #5552

@anupam-mishra

Description

@anupam-mishra

1. Description

Since upgrading to google-adk v1.29.0, the Multimodal Live API (gemini-live-2.5-flash-native-audio on Vertex AI) intermittently crashes with google.genai.errors.APIError: 1007 None.

The session typically establishes correctly, but the error triggers mid-conversation during active audio streaming. The error message explicitly cites an invalid audio format: "16khz s16le pcm, mono channel", despite the client-side input remaining consistent and verified at these specifications. This appears to be a regression in how the ADK frames or sequences audio blobs under sustained load or network jitter. This was not an issue in versions prior to this. I migrated to newer version to get this fix 6b1600f

Steps to Reproduce:

  • Initialize an ADK LlmAgent using the gemini-live-2.5-flash-native-audio model on Vertex AI.
  • Establish a bidirectional session using runner.run_live().
  • Engage in a multi-turn conversation, providing sustained audio input (3+ minutes).
  • Observe the connection drop with the 1007 None traceback during an active audio turn.

Expected Behavior:
The WebSocket should maintain a stable bidirectional stream. The backend should consistently validate the audio packets provided by the ADK as long as the input format (PCM 16kHz) does not change.

Observed Behavior:
The connection terminates mid-stream with status 1007 (Invalid Frame Payload).

Log Snippet: APIError in live flow: 1007 None. error when processing input audio, please check if the inputaudio is in valid format: 16khz s16le pcm, mono channel.; Error

Environment Details:

ADK Library Version: 1.29.0

Google-GenAI Version: (Check via pip show google-genai)

Python Version: 3.14.4

Model: gemini-live-2.5-flash-native-audio (Vertex AI)

Deployment: Cloud Run

Minimal Reproduction Code:

Python
import asyncio
import os
from fastapi import FastAPI, WebSocket
from google.adk.agents.live_request_queue import LiveRequestQueue
from google.adk.agents.run_config import RunConfig, StreamingMode
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.memory import InMemoryMemoryService
from google.adk.agents import LlmAgent
from google.genai import types

  1. Minimal Agent Setup
    Replace with your specific instructions/tools if necessary
    mock_agent = LlmAgent(
    model="gemini-live-2.5-flash-native-audio",
    instructions="You are a helpful Python tutor."
    )

app = FastAPI()
session_service = InMemorySessionService()
memory_service = InMemoryMemoryService()
runner = Runner(
app_name="reproduction-app",
agent=mock_agent,
session_service=session_service,
memory_service=memory_service
)

@app.websocket("/ws/{user_id}/{session_id}")
async def websocket_endpoint(websocket: WebSocket, user_id: str, session_id: str):
await websocket.accept()

2. Minimal RunConfig

run_config = RunConfig(
streaming_mode=StreamingMode.BIDI,
response_modalities=["AUDIO"], # Required for your PCM player
input_audio_transcription=types.AudioTranscriptionConfig(language_codes=["en-GB"]),
output_audio_transcription=types.AudioTranscriptionConfig(language_codes=["en-GB"]),
session_resumption=types.SessionResumptionConfig(transparent=True),
context_window_compression=types.ContextWindowCompressionConfig(
trigger_tokens=100000, # Start compression at ~78% of 128k context
sliding_window=types.SlidingWindow(
target_tokens=80000 # Compress to ~62% of context, preserving recent turns
)
),
proactivity=types.ProactivityConfig(proactive_audio=True) if proactivity else None,
enable_affective_dialog=affective_dialog,
speech_config=types.SpeechConfig(
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(
voice_name=os.getenv("AGENT_VOICE", "Puck")
)
),
language_code=os.getenv("AGENT_LANGUAGE", "en-US")
)
)

live_request_queue = LiveRequestQueue()

3. Upstream: WebSocket -> Gemini

async def client_to_agent_messaging():
try:
while True:
message = await websocket.receive()
if "bytes" in message:
audio_blob = types.Blob(mime_type="audio/pcm;rate=16000", data=message["bytes"])
live_request_queue.send_realtime(audio_blob)
except Exception:
live_request_queue.close()

4. Downstream: Gemini -> WebSocket (Where the 1000 None occurs)

async def agent_to_client_messaging():
try:
async for event in runner.run_live(
user_id=user_id,
session_id=session_id,
live_request_queue=live_request_queue,
run_config=run_config,
):
await websocket.send_text(event.model_dump_json(exclude_none=True))
except Exception as e:
print(f"CRASH DETECTED: {e}")

done, pending = await asyncio.wait(
[
asyncio.create_task(client_to_agent_messaging()),
asyncio.create_task(agent_to_client_messaging()),
],
return_when=asyncio.FIRST_COMPLETED,
)

    # 1. Propagate exceptions from the tasks that finished
    for task in done:
        try:
            task.result()
        except Exception as e:
            logger.error(f"Task failed with exception: {e}")
            raise e

    # 2. Cancel the remaining tasks
    for task in pending:
        task.cancel()
        try:
            await task
        except asyncio.CancelledError:
            pass
finally:
    # Final cleanup: Just call close(), don't check for .is_closed()
    try:
        live_request_queue.close()
    except Exception:
        pass
        
    if websocket.client_state != WebSocketState.DISCONNECTED:
        try:
            await websocket.close()
        except Exception:
            pass

How often has this issue occurred?:

Very Frequently (60-70%+)

Metadata

Metadata

Assignees

Labels

live[Component] This issue is related to live, voice and video chat

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions