Skip to content

Port speech recognition and TTS to Flutter agent #32

Description

@SebastianBoehler

Context

The previous Android app had native speech recognition and TTS integrated into the assistant loop. The Flutter migration copied the old Android Java classes into the Android runner, and iOS already exposes basic speech/TTS capability checks plus a speak bridge method. The Flutter/Dart layer does not yet expose speech as a first-class assistant interaction surface.

Relevant existing code:

  • app/src/main/java/com/example/studyOS/online/SpeechRecognitionManager.java
  • app/src/main/java/com/example/studyOS/online/TTS.java
  • flutter_app/android/app/src/main/java/com/example/studyOS/online/SpeechRecognitionManager.java
  • flutter_app/android/app/src/main/java/com/example/studyOS/online/TTS.java
  • flutter_app/android/app/src/main/AndroidManifest.xml
  • flutter_app/android/app/src/main/kotlin/com/studyos/studyos_agent/AndroidIntentBridge.kt
  • flutter_app/ios/Runner/StudyOSNativeBridge.swift
  • flutter_app/lib/src/native_bridge.dart

Current state

Previous Android behavior:

  • SpeechRecognizer listens in German by default with language switching support.
  • TTS uses Android TextToSpeech with Google TTS, German default voice, rate/pitch tuning, and utterance callbacks.
  • Speech recognition restarts after errors and after TTS completion.
  • The recognizer has passive/wake modes and can interrupt TTS when the user starts speaking.
  • BLE headset routing exists in the old speech path.

Flutter migration state:

  • Android manifest includes RECORD_AUDIO, foreground-service microphone/media permissions, and voice intent filters.
  • AndroidIntentBridge can consume recognized text from external voice intents.
  • iOS bridge requests speech authorization and exposes speak, but does not implement full recognition sessions.
  • Dart NativeBridge does not yet expose speak, startSpeechRecognition, stopSpeechRecognition, or speech event streams.

Proposed scope

Implement a Flutter speech layer with explicit platform behavior:

  1. Text-to-speech

    • Add Dart NativeBridge.speak(text, language?).
    • Route Android to copied TTS or a small Kotlin wrapper around Android TextToSpeech.
    • Route iOS to existing AVSpeechSynthesizer bridge.
    • Return explicit unsupported errors for web/desktop unless a separate browser adapter is added.
  2. Speech recognition

    • Add user-triggered start/stop controls in Flutter.
    • Route Android to native SpeechRecognitionManager or a cleaner bridge adapter around SpeechRecognizer.
    • Emit recognized text back to Flutter through the existing event channel.
    • Keep always-listening/passive mode as Android-only and optional; do not promise iOS parity.
  3. Assistant integration

    • Recognized text should enter the same message pipeline as typed input.
    • TTS should be optional and user-controllable, not forced for every response.

Acceptance criteria

  • Flutter UI exposes a microphone control with clear recording/listening state.
  • Flutter settings or capability display shows whether speech recognition/TTS are available on the current platform.
  • Android speech recognition returns transcripts into the Flutter chat input/message flow.
  • Android TTS can speak assistant responses or explicit speech tool output.
  • iOS TTS works through the existing bridge; iOS speech recognition is either implemented as active-session recognition or clearly marked unsupported/pending.
  • The app handles missing microphone/speech permissions with clear errors.
  • Speech/TTS callbacks do not create infinite listen/speak loops.
  • Always-listening/passive mode is gated behind Android-only capability checks and explicit user enablement.

Platform/privacy notes

  • Continuous listening is privacy-sensitive and should be opt-in with visible state.
  • iOS background/always-listening behavior is not equivalent to Android and should not be implied.
  • Browser/web speech APIs are inconsistent and should be treated as a separate optional adapter.
  • Avoid sending raw ambient/passive speech to hosted services without explicit user action and consent.

Test expectations

  • Add Dart tests for speech capability state and recognized-text routing where practical.
  • Add Android manual QA for microphone permission denial, active recognition, TTS playback, and TTS-to-listening callback behavior.
  • Add iOS manual QA for TTS and speech permission/capability reporting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions