Hey guys, I am not sure where to put it, but I just want to share my implementation of the speech client v2 and some thoughts about migrating from v1 to v2.
Official docs are unfortunately provide only python code, and there is not much info except this repo and the example I used to migrate: transcribeStreaming.v2.js
My sdk verison is:
"@google-cloud/speech": "^6.0.1",
In my code I first initialize the service as:
const service = createGoogleService({ language, send })
and then use service.transcribeAudio(data) whenever there is a new audio coming from the frontend, which uses
const mediaRecorder = new MediaRecorder(audioStream, { mimeType: 'audio/webm;codecs=opus' }) // its a default param;
mediaRecorder.ondataavailable = (event: BlobEvent) => {
... send the event.data to the backend
}
thus an audio chunk is just a browser Blob object.
My service:
import { logger } from '../..//logger';
import { getText, transformGoogleResponse } from './utils';
import { v2 as speech } from '@google-cloud/speech';
import { StreamingRecognizeResponse } from './google.types';
import { TranscriptionService } from '../transcription.types';
import { MachineEvent } from '../../websocket/websocket.types';
import { Sender } from 'xstate';
import { parseErrorMessage } from '../../../utils';
import { findRecognizerByLanguageCode } from './recognizers';
export const createGoogleService = ({
language,
send,
}: {
language: string;
send: Sender<MachineEvent>;
}): Promise<TranscriptionService> => {
return new Promise((resolve, reject) => {
try {
const client = new speech.SpeechClient({
keyFilename: 'assistant-demo.json',
});
const recognizer = findRecognizerByLanguageCode(language).name;
const configRequest = {
recognizer,
streamingConfig: {
config: {
autoDecodingConfig: {},
},
streamingFeatures: {
enableVoiceActivityEvents: true,
interimResults: false,
},
},
};
logger.info('Creating Google service with recogniser:', recognizer);
const recognizeStream = client
._streamingRecognize()
.on('error', error => {
logger.error('Error on "error" in recognizeStream', error);
send({ type: 'ERROR', data: parseErrorMessage(error) });
})
.on('data', (data: StreamingRecognizeResponse) => {
if (data.speechEventType === 'SPEECH_ACTIVITY_END') {
send({ type: 'SPEECH_END', data: 'SPEECH_END' });
}
if (data.results.length > 0) {
const transcription = transformGoogleResponse(data);
if (transcription) {
const transcriptionText = getText(transcription);
if (!transcriptionText?.length) {
// if the transcription is empty, do nothing
return;
}
send({ type: 'NEW_TRANSCRIPTION', data: transcriptionText });
}
}
})
.on('end', () => {
logger.warn('Google recognizeStream ended');
});
let configSent = false;
const transcribeAudio = (audioData: Buffer) => {
if (!configSent) {
recognizeStream.write(configRequest);
configSent = true;
}
recognizeStream.write({ audio: audioData });
};
const stop = () => {
if (recognizeStream) {
recognizeStream.end();
}
};
resolve({ stop, transcribeAudio });
} catch (error) {
logger.error('Error creating Google service:', error);
reject(error);
}
});
};
Migration considerations
- to use v2 you need to create a recognizer, I did it with this function:
/**
* Creates a new recognizer.
*
* @param {string} projectId - The ID of the Google Cloud project.
* @param {string} location - The location for the recognizer.
* @param {string} recognizerId - The ID for the new recognizer.
* @param {string} languageCode - The language code for the recognizer.
* @returns {Promise<object>} The created recognizer.
* @throws Will throw an error if the recognizer creation fails.
*/
export const createRecognizer = async (
projectId: string,
location: string,
recognizerId: string,
languageCode: string
) => {
const client = new v2.SpeechClient({
keyFilename: 'assistant-demo.json',
});
const request = {
parent: `projects/${projectId}/locations/${location}`,
recognizer: {
languageCodes: [languageCode],
model: 'latest_long',
// Add any additional configuration here
},
recognizerId,
};
try {
console.log('Creating recognizer...', request);
const [operation] = await client.createRecognizer(request);
const [recognizer] = await operation.promise();
return recognizer;
} catch (error) {
console.error('Failed to create recognizer:', error);
throw error;
}
};
- The config object now should be sent as first data to the stream object, immediately before the audio, so if you did
recognizingClient.write(audioData) before, now you should do (but only once!)recognizingClient.write(newConfigWithRecognizer) and then recognizingClient.write({ audio: audioData }) <<< notice the object notation
- The config object itself has been changed to:
public streamingConfig?: (google.cloud.speech.v2.IStreamingRecognitionConfig|null);
/** Properties of a StreamingRecognitionConfig. */
interface IStreamingRecognitionConfig {
** StreamingRecognitionConfig config */
config?: (google.cloud.speech.v2.IRecognitionConfig|null);
/** StreamingRecognitionConfig configMask */
configMask?: (google.protobuf.IFieldMask|null);
/** StreamingRecognitionConfig streamingFeatures */
streamingFeatures?: (google.cloud.speech.v2.IStreamingRecognitionFeatures|null);
}
- When instantiating streamingClient use
_streamingRecognize() (this probably is likely to be changed)
Hey guys, I am not sure where to put it, but I just want to share my implementation of the speech client v2 and some thoughts about migrating from v1 to v2.
Official docs are unfortunately provide only python code, and there is not much info except this repo and the example I used to migrate: transcribeStreaming.v2.js
My sdk verison is:
In my code I first initialize the service as:
and then use
service.transcribeAudio(data)whenever there is a new audio coming from the frontend, which usesthus an audio chunk is just a browser
Blobobject.My service:
Migration considerations
recognizingClient.write(audioData)before, now you should do (but only once!)recognizingClient.write(newConfigWithRecognizer)and thenrecognizingClient.write({ audio: audioData })<<< notice the object notation_streamingRecognize()(this probably is likely to be changed)