Speech client v2, my, working OK, example

Hey guys, I am not sure where to put it, but I just want to share my implementation of the speech client v2 and some thoughts about migrating from v1 to v2. 
Official docs are unfortunately provide only python code, and there is not much info except this repo and the example I used to migrate: [transcribeStreaming.v2.js](https://github.com/GoogleCloudPlatform/nodejs-docs-samples/blob/main/speech/transcribeStreaming.v2.js)

My sdk verison is:
```
"@google-cloud/speech": "^6.0.1",
```

In my code I first initialize the service as:
```
const service = createGoogleService({ language, send })
```
and then use `service.transcribeAudio(data)` whenever there is a new audio coming from the frontend, which uses 
```
const mediaRecorder = new MediaRecorder(audioStream, { mimeType: 'audio/webm;codecs=opus' }) // its a default param;
mediaRecorder.ondataavailable = (event: BlobEvent) => {
... send the event.data to the backend
}
```
thus an audio chunk is just a browser `Blob` object. 

## My service:
```
import { logger } from '../..//logger';
import { getText, transformGoogleResponse } from './utils';
import { v2 as speech } from '@google-cloud/speech';
import { StreamingRecognizeResponse } from './google.types';
import { TranscriptionService } from '../transcription.types';
import { MachineEvent } from '../../websocket/websocket.types';
import { Sender } from 'xstate';
import { parseErrorMessage } from '../../../utils';
import { findRecognizerByLanguageCode } from './recognizers';

export const createGoogleService = ({
  language,
  send,
}: {
  language: string;
  send: Sender<MachineEvent>;
}): Promise<TranscriptionService> => {
  return new Promise((resolve, reject) => {
    try {
      const client = new speech.SpeechClient({
        keyFilename: 'assistant-demo.json',
      });

      const recognizer = findRecognizerByLanguageCode(language).name;

      const configRequest = {
        recognizer,
        streamingConfig: {
          config: {
            autoDecodingConfig: {},
          },
          streamingFeatures: {
            enableVoiceActivityEvents: true,
            interimResults: false,
          },
        },
      };

      logger.info('Creating Google service with recogniser:', recognizer);

      const recognizeStream = client
        ._streamingRecognize()
        .on('error', error => {
          logger.error('Error on "error" in recognizeStream', error);
          send({ type: 'ERROR', data: parseErrorMessage(error) });
        })
        .on('data', (data: StreamingRecognizeResponse) => {
          if (data.speechEventType === 'SPEECH_ACTIVITY_END') {
            send({ type: 'SPEECH_END', data: 'SPEECH_END' });
          }
          if (data.results.length > 0) {
            const transcription = transformGoogleResponse(data);
            if (transcription) {
              const transcriptionText = getText(transcription);
              if (!transcriptionText?.length) {
                // if the transcription is empty, do nothing
                return;
              }
              send({ type: 'NEW_TRANSCRIPTION', data: transcriptionText });
            }
          }
        })
        .on('end', () => {
          logger.warn('Google recognizeStream ended');
        });

      let configSent = false;

      const transcribeAudio = (audioData: Buffer) => {
        if (!configSent) {
          recognizeStream.write(configRequest);
          configSent = true;
        }
        recognizeStream.write({ audio: audioData });
      };

      const stop = () => {
        if (recognizeStream) {
          recognizeStream.end();
        }
      };

      resolve({ stop, transcribeAudio });
    } catch (error) {
      logger.error('Error creating Google service:', error);
      reject(error);
    }
  });
};
```
## Migration considerations

1. to use v2 you need to create a recognizer, I did it with this function:
```
/**
 * Creates a new recognizer.
 *
 * @param {string} projectId - The ID of the Google Cloud project.
 * @param {string} location - The location for the recognizer.
 * @param {string} recognizerId - The ID for the new recognizer.
 * @param {string} languageCode - The language code for the recognizer.
 * @returns {Promise<object>} The created recognizer.
 * @throws Will throw an error if the recognizer creation fails.
 */
export const createRecognizer = async (
  projectId: string,
  location: string,
  recognizerId: string,
  languageCode: string
) => {
  const client = new v2.SpeechClient({
    keyFilename: 'assistant-demo.json',
  });

  const request = {
    parent: `projects/${projectId}/locations/${location}`,
    recognizer: {
      languageCodes: [languageCode],
      model: 'latest_long',
      // Add any additional configuration here
    },
    recognizerId,
  };

  try {
    console.log('Creating recognizer...', request);
    const [operation] = await client.createRecognizer(request);
    const [recognizer] = await operation.promise();
    return recognizer;
  } catch (error) {
    console.error('Failed to create recognizer:', error);
    throw error;
  }
};
```
2. The config object now should be sent as first data to the stream object, immediately before the audio, so if you did `recognizingClient.write(audioData)` before, now you should do (but only once!)`recognizingClient.write(newConfigWithRecognizer)` and then `recognizingClient.write({ audio: audioData })` <<< notice the object notation
3. The config object itself has been changed to:
```
public streamingConfig?: (google.cloud.speech.v2.IStreamingRecognitionConfig|null);

/** Properties of a StreamingRecognitionConfig. */
interface IStreamingRecognitionConfig {

** StreamingRecognitionConfig config */
config?: (google.cloud.speech.v2.IRecognitionConfig|null);

/** StreamingRecognitionConfig configMask */
configMask?: (google.protobuf.IFieldMask|null);

/** StreamingRecognitionConfig streamingFeatures */
streamingFeatures?: (google.cloud.speech.v2.IStreamingRecognitionFeatures|null);
}
```
4. When instantiating streamingClient use `_streamingRecognize()` (this probably is likely to be changed)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech client v2, my, working OK, example #3578

My service:

Migration considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Speech client v2, my, working OK, example #3578

Description

My service:

Migration considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions