From August 13 2024, it seems that google speech to text v2 api is taking too much for transcribing. Before that it was working fine.
Have anyone experienced that. I am open to discuss and It will be very helpful.
Technologies used: Nodejs 20, speech to text v2 api
Thank you for reaching out. Here are the detailed situation:
Audio Length: 46 seconds
Audio Type: .mp3
Language: Japanese
Cloud Run Function Region: asia-northeast1
Recognizer Region: global
Node js code:
const { v2 } = require('@google-cloud/speech');
const speechClient = new v2.SpeechClient();
const { PubSub } = require('@google-cloud/pubsub');
const pubSubClient = new PubSub();
const recognizerId = await getRecognizer(languageCode, googleLanguageCode);
const config = {
autoDetectDecodingConfig: {},
languageCodes: [googleLanguageCode],
model: 'long',
features: {
enableWordTimeOffsets: true,
enableAutomaticPunctuation: true,
},
};
const fileMetadata = {
uri: gs_uri,
};
const request = {
recognizer: recognizerId,
config,
files: [fileMetadata],
recognitionOutputConfig: {
inlineResponseConfig: {},
},
};
try {
const [operation] = await speechClient.batchRecognize(request);
let data = JSON.stringify({ operationName: operation.name, fileName: file.name, uri: gs_uri });
const dataBuffer = Buffer.from(data);
await pubSubClient.topic(pubsubTopicId).publishMessage({ data: dataBuffer });
console.log(`Message published to the topic id: ${pubsubTopicId}`);
} catch (error) {
console.error('Error starting transcription:', error);
}
}
async function getRecognizer(languageCode, googleLanguageCode) {
const recognizerId = `${languageCode}recognizer`;
const parent = speechClient.locationPath(projectId, 'global');
const [recognizers] = await speechClient.listRecognizers({ parent });
const recognizer = recognizers.find(r => r.name.endsWith(recognizerId));
if (recognizer) {
return recognizer.name;
} else {
const [operation] = await speechClient.createRecognizer({
recognizer: {
languageCodes: [googleLanguageCode],
model: 'long',
defaultRecognitionConfig: {
features: {
enableWordTimeOffsets: true,
enableAutomaticPunctuation: true,
},
autoDecodingConfig: {}
}
},
recognizerId: recognizerId,
parent: parent,
});
const [response] = await operation.promise();
return response.name;
}
}
The above code sends to transcribe the audio. And in another function, it's transcription progress is checked. The code is below:
const [operation] = await speechClient.operationsClient.getOperation({ name: operationName });
if (operation.done) {
done = true;
if (operation.response) {
console.log(`Operation ${operationName} is done. Processing results.`);
const BatchRecognizeResponse = protoRoot.lookupType('google.cloud.speech.v2.BatchRecognizeResponse');
const response = BatchRecognizeResponse.decode(operation.response.value);
console.log('here is the response:', response.results[gcsUri])
const inlineResult = response.results[gcsUri].inlineResult;
const transcriptResults = inlineResult.transcript.results;
const srt = gstt.convertGSTTToSRT(transcriptResults);
await streamFileUpload(fileName, srt).catch(console.error);
} else if (operation.error) {
await streamFileUpload(fileName, "").catch(console.error);
console.error('No transcription results found in the response.');
}
} else {
const OperationMetadata = protoRoot.lookupType('google.cloud.speech.v2.OperationMetadata');
const metadata = OperationMetadata.decode(operation.metadata.value);
console.log('here is the metadata:',metadata)
const progressPercent = metadata.progressPercent || 0;
console.log(`Transcription progress: ${progressPercent}%`);
await new Promise(resolve => setTimeout(resolve, interval));
interval = Math.min(interval * 2, 4000);
}
Before August 13, the same audio took around 11 seconds to complete but now it's taking more than 1 hour to complete.
Quota seems fine. And network connectivity is almost same. I have tested with more than 1 audio and audio lengths are all less than 10 seconds except that 46 seconds one. All the reports seems fine. The transcription completes but takes more than 1 hour.
I hope the above information helps. I am looking for quick response. Thank you.