Google’s Text-to-Speech (TTS) API is a powerful tool that can generate realistic, natural-sounding speech from text inputs. It’s highly flexible, offering configuration options like language selection, audio encoding formats, and even the gender of the speaking voice.
In this tutorial, I’ll walk you through how to create and configure Google Cloud Text-to-Speech API calls in multiple languages, from Python to Node.js, to get you up and running with your audio files in no time.
Unlock the power of seamless voice generation with PlayHT’s text to speech API, featuring the lowest latency in the industry. Enhance your applications with high-quality, natural-sounding AI voices and deliver an exceptional user experience – in real time.
The Google Cloud SDK includes the gcloud command-line tool, which makes interacting with Google services easier.
To authenticate with the TTS API using gcloud
, run:
gcloud auth application-default login
This command will open a browser window to sign in with your Google account and link your local SDK installation to your GCP project.
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json"
For Python, install the Google client library:
pip install google-cloud-texttospeech
In Node.js:
npm install @google-cloud/text-to-speech
Simple Python Script to Convert Text to Speech
from google.cloud import texttospeech
# Authenticate using your JSON credentials
client = texttospeech.TextToSpeechClient()
# Text input
synthesis_input = texttospeech.SynthesisInput(text="Hello, world!")
# Set the voice parameters (language and gender)
voice = texttospeech.VoiceSelectionParams(
language_code="en-US",
ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
)
# Configure audio output
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
# Synthesize speech
response = client.synthesize_speech(
input=synthesis_input,
voice=voice,
audio_config=audio_config
)
# Save the audio file
with open("output.mp3", "wb") as out:
out.write(response.audio_content)
print("Audio content written to file 'output.mp3'")
This script will generate a TTS audio file named output.mp3 in the current directory.
"en-US"
and NEUTRAL
.You can configure speaking rate, pitch, and even add custom text in SSML for enhanced control over speech synthesis.
For those more comfortable in JavaScript, here’s how to use Google’s TTS API with Node.js.
const textToSpeech = require('@google-cloud/text-to-speech');
const fs = require('fs');
const util = require('util');
async function convertTextToSpeech() {
const client = new textToSpeech.TextToSpeechClient();
const request = {
input: { text: 'Hello, world!' },
voice: { languageCode: 'en-US', ssmlGender: 'NEUTRAL' },
audioConfig: { audioEncoding: 'MP3' },
};
const [response] = await client.synthesizeSpeech(request);
const writeFile = util.promisify(fs.writeFile);
await writeFile('output.mp3', response.audioContent, 'binary');
console.log('Audio content written to file "output.mp3"');
}
convertTextToSpeech();
To change the language, specify a different languageCode
, like "fr-FR"
for French. You can also select different voices using Google’s Wavenet voices, which leverage machine learning to create more lifelike speech.
MALE
, FEMALE
, or NEUTRAL
.MP3
: Compresses audio for smaller file sizes.LINEAR16
: Uncompressed audio, ideal for applications needing high quality.OGG_OPUS
: Highly efficient for speech, ideal for web-based audio.The Google Cloud TTS API operates on a pay-per-character model, so it’s affordable for small projects. For enterprise or high-volume applications, Google offers pricing tiers. Consult the [pricing documentation](https://cloud.google.com/text-to-speech/pricing) on Google Cloud for up-to-date details.
You can find more advanced setups and examples on Google’s GitHub, or create your own repository for versioning and sharing your TTS scripts.
For more details, refer to the official Google Cloud Text-to-Speech API documentation.
Here’s a rundown on what’s missing based on your provided keywords and a quick guide to further refine the article:
For simple tests, use gcloud commands to make API calls directly from the command line. It’s a great way to quickly verify configurations before coding.
Authenticate your session if needed:
gcloud auth application-default login
Make a simple API call using the curl
command to generate an MP3 file:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
--data '{
"input": { "text": "Hello, world!" },
"voice": { "languageCode": "en-US", "ssmlGender": "NEUTRAL" },
"audioConfig": { "audioEncoding": "MP3" }
}' \
"https://texttospeech.googleapis.com/v1/text:synthesize" \
-o "output.mp3"
This command saves the output file as output.mp3
, enabling you to play it back immediately.
To cater to Java developers, here’s a Java snippet that demonstrates using the TTS API:
import com.google.cloud.texttospeech.v1.*;
import com.google.protobuf.ByteString;
import java.io.FileOutputStream;
import java.io.OutputStream;
public class TextToSpeechExample {
public static void main(String[] args) throws Exception {
TextToSpeechClient client = TextToSpeechClient.create();
SynthesisInput input = SynthesisInput.newBuilder()
.setText("Hello, world!")
.build();
VoiceSelectionParams voice = VoiceSelectionParams.newBuilder()
.setLanguageCode("en-US")
.setSsmlGender(SsmlVoiceGender.NEUTRAL)
.build();
AudioConfig audioConfig = AudioConfig.newBuilder()
.setAudioEncoding(AudioEncoding.MP3)
.build();
SynthesizeSpeechResponse response = client.synthesizeSpeech(input, voice, audioConfig);
ByteString audioContent = response.getAudioContent();
try (OutputStream out = new FileOutputStream("output.mp3")) {
out.write(audioContent.toByteArray());
System.out.println("Audio content written to file 'output.mp3'");
}
}
}
This code saves an MP3 file named output.mp3
. Make sure to include the Google Cloud SDK library in your Java project.
While Google’s TTS is a robust solution, developers working with Microsoft platforms may consider exploring how Google’s TTS compares to Microsoft Azure’s Speech Service. Each platform has unique benefits for specific use cases, so it can be valuable to evaluate both based on project needs.