Google Text to Speech API: A Step-by-Step Tutorial with Examples Google Text To Speech API Example. Get started in minutes. We walk through various samples and take you from start to finish in a simple project.

in API

October 6, 2024 6 min read
Google Text to Speech API: A Step-by-Step Tutorial with Examples

Low latency, highest quality text to speech API

clone voiceClone your voice
Free API Playground

Table of Contents

Google’s Text-to-Speech (TTS) API is a powerful tool that can generate realistic, natural-sounding speech from text inputs. It’s highly flexible, offering configuration options like language selection, audio encoding formats, and even the gender of the speaking voice.

In this tutorial, I’ll walk you through how to create and configure Google Cloud Text-to-Speech API calls in multiple languages, from Python to Node.js, to get you up and running with your audio files in no time.

Getting Started with Google Cloud Platform (GCP)

  1. Sign up and Set Up: Head over to the Google Cloud Platform (GCP) and sign up for a free account if you haven’t already. GCP provides you access to the Text-to-Speech API, among other machine learning and AI services.
  2. Quickstart Guide: For a streamlined setup, follow Google’s quickstart guide available in the Text-to-Speech SDK documentation. You’ll be up and running in just a few steps.

Get Started with the Lowest Latency Text to Speech API

Unlock the power of seamless voice generation with PlayHT’s text to speech API, featuring the lowest latency in the industry. Enhance your applications with high-quality, natural-sounding AI voices and deliver an exceptional user experience – in real time.

Try Playground Get Started

Using the SDK and Authentication with gcloud

The Google Cloud SDK includes the gcloud command-line tool, which makes interacting with Google services easier.

To authenticate with the TTS API using gcloud, run:

gcloud auth application-default login

This command will open a browser window to sign in with your Google account and link your local SDK installation to your GCP project.

Setting Up Google Text-to-Speech API

Step 1: Setting Up Google Cloud and Authentication

  1. Create a Google Cloud Project: Head to the Google Cloud Console and create a new project.
  2. Enable the TTS API: Go to the “APIs & Services” dashboard, search for “Text-to-Speech API,” and enable it.
  3. Service Account: To access the API, you’ll need a Service Account. Go to “IAM & Admin” > “Service Accounts,” create a new service account, and download the JSON key file. This file will be used to authenticate API requests.
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json"

Step 2: Install Dependencies

For Python, install the Google client library:

pip install google-cloud-texttospeech

In Node.js:

npm install @google-cloud/text-to-speech

Example Code in Python

Simple Python Script to Convert Text to Speech

from google.cloud import texttospeech

# Authenticate using your JSON credentials

client = texttospeech.TextToSpeechClient()

# Text input

synthesis_input = texttospeech.SynthesisInput(text="Hello, world!")

# Set the voice parameters (language and gender)

voice = texttospeech.VoiceSelectionParams(

language_code="en-US",

ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL

)

# Configure audio output

audio_config = texttospeech.AudioConfig(

audio_encoding=texttospeech.AudioEncoding.MP3

)

# Synthesize speech

response = client.synthesize_speech(

input=synthesis_input,

voice=voice,

audio_config=audio_config

)

# Save the audio file

with open("output.mp3", "wb") as out:

out.write(response.audio_content)

print("Audio content written to file 'output.mp3'")

This script will generate a TTS audio file named output.mp3 in the current directory.

Key Components

  1. SynthesisInput: Holds the text or Speech Synthesis Markup Language (SSML) content.
  2. VoiceSelectionParams: Configures language and gender; here, we use "en-US" and NEUTRAL.
  3. AudioConfig: Specifies the audio encoding (in this case, MP3). Other formats include LINEAR16 and OGG_OPUS.

Additional Options

You can configure speaking rate, pitch, and even add custom text in SSML for enhanced control over speech synthesis.

JavaScript/Node.js Example

For those more comfortable in JavaScript, here’s how to use Google’s TTS API with Node.js.

const textToSpeech = require('@google-cloud/text-to-speech');

const fs = require('fs');

const util = require('util');

async function convertTextToSpeech() {

const client = new textToSpeech.TextToSpeechClient();

const request = {

input: { text: 'Hello, world!' },

voice: { languageCode: 'en-US', ssmlGender: 'NEUTRAL' },

audioConfig: { audioEncoding: 'MP3' },

};

const [response] = await client.synthesizeSpeech(request);

const writeFile = util.promisify(fs.writeFile);

await writeFile('output.mp3', response.audioContent, 'binary');

console.log('Audio content written to file "output.mp3"');

}

convertTextToSpeech();

Configuring TTS for Your Needs

Language and Voice Customization

To change the language, specify a different languageCode, like "fr-FR" for French. You can also select different voices using Google’s Wavenet voices, which leverage machine learning to create more lifelike speech.

Voice Parameters

  1. ssmlGender: Choose from MALE, FEMALE, or NEUTRAL.
  2. speakingRate: Adjusts the speed of the speech output.
  3. pitch: Changes the pitch level.

Audio Encoding Options

  1. MP3: Compresses audio for smaller file sizes.
  2. LINEAR16: Uncompressed audio, ideal for applications needing high quality.
  3. OGG_OPUS: Highly efficient for speech, ideal for web-based audio.

Pricing

The Google Cloud TTS API operates on a pay-per-character model, so it’s affordable for small projects. For enterprise or high-volume applications, Google offers pricing tiers. Consult the [pricing documentation](https://cloud.google.com/text-to-speech/pricing) on Google Cloud for up-to-date details.

Additional Use Cases and Advanced Configurations

  1. Real-time audio streaming: Use the API for applications like virtual assistants or live narrations.
  2. Chrome Extensions: Integrate TTS into browser extensions for accessibility tools.
  3. Convert Text to formats compatible with IoT devices, or for background audio on websites.

Testing and Troubleshooting

  1. Check Permissions: Ensure your service account has the right permissions for TTS.
  2. Debugging Errors: Review the Google Cloud Console logs for detailed error messages.
  3. Config Tweaks: Experiment with different audioEncoding and speaking rate configurations for the best results.

You can find more advanced setups and examples on Google’s GitHub, or create your own repository for versioning and sharing your TTS scripts.

For more details, refer to the official Google Cloud Text-to-Speech API documentation.

Here’s a rundown on what’s missing based on your provided keywords and a quick guide to further refine the article:

A Bit More Help, Should You Need It 🙂

Command Line Options for Quick Testing

For simple tests, use gcloud commands to make API calls directly from the command line. It’s a great way to quickly verify configurations before coding.

Authenticate your session if needed:

gcloud auth application-default login

Make a simple API call using the curl command to generate an MP3 file:

curl -X POST \

-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \

-H "Content-Type: application/json; charset=utf-8" \

--data '{

"input": { "text": "Hello, world!" },

"voice": { "languageCode": "en-US", "ssmlGender": "NEUTRAL" },

"audioConfig": { "audioEncoding": "MP3" }

}' \

"https://texttospeech.googleapis.com/v1/text:synthesize" \

-o "output.mp3"

This command saves the output file as output.mp3, enabling you to play it back immediately.

Example Code in Java

To cater to Java developers, here’s a Java snippet that demonstrates using the TTS API:

import com.google.cloud.texttospeech.v1.*;

import com.google.protobuf.ByteString;

import java.io.FileOutputStream;

import java.io.OutputStream;

public class TextToSpeechExample {

public static void main(String[] args) throws Exception {

TextToSpeechClient client = TextToSpeechClient.create();

SynthesisInput input = SynthesisInput.newBuilder()

.setText("Hello, world!")

.build();

VoiceSelectionParams voice = VoiceSelectionParams.newBuilder()

.setLanguageCode("en-US")

.setSsmlGender(SsmlVoiceGender.NEUTRAL)

.build();

AudioConfig audioConfig = AudioConfig.newBuilder()

.setAudioEncoding(AudioEncoding.MP3)

.build();

SynthesizeSpeechResponse response = client.synthesizeSpeech(input, voice, audioConfig);

ByteString audioContent = response.getAudioContent();

try (OutputStream out = new FileOutputStream("output.mp3")) {

out.write(audioContent.toByteArray());

System.out.println("Audio content written to file 'output.mp3'");

}

}

}

This code saves an MP3 file named output.mp3. Make sure to include the Google Cloud SDK library in your Java project.

Integrations with Microsoft Platforms

While Google’s TTS is a robust solution, developers working with Microsoft platforms may consider exploring how Google’s TTS compares to Microsoft Azure’s Speech Service. Each platform has unique benefits for specific use cases, so it can be valuable to evaluate both based on project needs.

Recent Posts

Listen & Rate TTS Voices

See Leaderboard

Top AI Apps

Alternatives

Similar articles