Deepgram Text to Speech in Python: A Comprehensive Guide Learn how to use the Deepgram text to speech Python SDK with examples for real-time audio streaming, speech-to-text, TTS, and voice AI in your next project.

in API

September 14, 2024 6 min read
Deepgram Text to Speech in Python: A Comprehensive Guide

Low latency, highest quality text to speech API

clone voiceClone your voice
Free API Playground

Table of Contents

If you’re an engineer looking to integrate Deepgram’s text-to-speech (TTS) and speech-to-text (STT) capabilities in Python, you’re in the right place. Deepgram provides robust APIs and SDKs that enable seamless speech processing in real-time and pre-recorded scenarios.

This article will walk you through the steps for installing the Deepgram Python SDK, using both REST and streaming for TTS, and exploring other powerful features such as speech intelligence, integrations, and optimizations for latency and customization.

1. Installing the Deepgram Python SDK

Before you can start working with Deepgram’s APIs, you’ll need to install the Python SDK. It’s super straightforward.

# Install the Deepgram Python SDK via pip

pip install deepgram-sdk==3.*

This SDK supports both REST and streaming operations, as well as advanced features like asyncio for managing concurrent tasks. Now, let’s dive into TTS!

2. Text-to-Speech (TTS) Using REST and Streaming

Deepgram allows you to transform text into human-like speech. You can do this via REST for quick requests or WebSocket streaming for real-time applications. Here’s how you can implement both:

REST TTS Example

import os

from deepgram import Deepgram

# Initialize Deepgram client using the API key

DEEPGRAM_API_KEY = os.getenv('DEEPGRAM_API_KEY')

deepgram = Deepgram(DEEPGRAM_API_KEY)

# Prepare the text to be converted

text = "Hello, this is a text-to-speech example using Deepgram."

# Define request headers

headers = {'Content-Type': 'application/json'}

# Make a REST API request

async def convert_text_to_speech():

response = await deepgram.tts.synthesize(text, 'en-US', 'wav')

with open('output.wav', 'wb') as f:

f.write(response['audio_data'])

print("Audio file saved.")

# Run the async function

import asyncio

asyncio.run(convert_text_to_speech())

Streaming TTS Example

For real-time use cases (like AI-powered voice agents or live narration), you’ll want to stream the TTS results.

from deepgram import Deepgram

import asyncio

# Initialize Deepgram client

DEEPGRAM_API_KEY = os.getenv('DEEPGRAM_API_KEY')

dg_client = Deepgram(DEEPGRAM_API_KEY)

# Create a websocket connection and stream TTS output

async def stream_tts():

ws = await dg_client.speak.websocket('1')

async def on_binary_data(data):

with open('live_output.wav', 'ab') as f:

f.write(data)

ws.on('binary_data', on_binary_data)

await ws.speak('Hello, this is a real-time TTS example using Deepgram.')

# Run the async function

asyncio.run(stream_tts())

The WebSocket API lets you interact with the Flush and Clear control messages for managing text buffers and ensuring low-latency output.

3. Deepgram SDK Functionality: Threaded vs Async Clients

The SDK provides both Threaded and Async/Await clients, giving you flexibility in how you handle I/O-bound tasks. The Threaded client is straightforward for quick operations, while Async/Await is useful when dealing with real-time streaming or when you need to handle multiple requests concurrently.

Example of using asyncio to handle WebSocket data:

async def async_ws_example():

dg = Deepgram(DEEPGRAM_API_KEY)

ws = await dg.speak.websocket('1')

async def on_message(data):

print("Message received: ", data)

ws.on('message', on_message)

await ws.speak("Deepgram makes AI sound human.")

asyncio.run(async_ws_example())

4. Speech-to-Text (STT): Live and Pre-Recorded Transcription

Deepgram’s Python SDK also excels in converting both live audio and pre-recorded audio files into text.

Transcribing Pre-Recorded Audio

import os

from deepgram import Deepgram

# Initialize Deepgram client

DEEPGRAM_API_KEY = os.getenv('DEEPGRAM_API_KEY')

deepgram = Deepgram(DEEPGRAM_API_KEY)

async def transcribe_audio():

with open('audio.wav', 'rb') as audio_file:

source = {'buffer': audio_file, 'mimetype': 'audio/wav'}

response = await deepgram.transcription.pre_recorded(source, {'punctuate': True})

print(response['results']['channels'][0]['alternatives'][0]['transcript'])

# Run the async function

import asyncio

asyncio.run(transcribe_audio())

Live Audio Transcription

import asyncio

from deepgram import Deepgram

DEEPGRAM_API_KEY = os.getenv('DEEPGRAM_API_KEY')

deepgram = Deepgram(DEEPGRAM_API_KEY)

async def live_transcription():

ws = await deepgram.transcription.live({'punctuate': True})

async def on_transcript(data):

print(data)

ws.on('transcript', on_transcript)

await ws.send_audio_from_file('live_audio.wav')

# Run the async function

asyncio.run(live_transcription())

5. Advanced Features: Entity Detection, Sentiment Analysis, and Latency Optimization

The SDK offers additional features like:

  1. Entity Detection: Helps extract key information (people, locations, etc.) from audio.
  2. Sentiment Analysis: Classifies the emotional tone of speech.
  3. Latency Optimization: Implement TTS chunking to reduce latency in streaming applications.

Example of optimizing TTS for real-time streaming:

async def optimized_stream_tts():

ws = await dg_client.speak.websocket('1')

await ws.speak('This is optimized for real-time TTS.', flush=True)

await ws.speak('Hello, world!', clear=True)

6. Integrations and Use Cases

Deepgram integrates seamlessly with popular platforms like Twilio, Zoom, and AWS. Example:

# Integrating Twilio with Deepgram for speech-to-text transcription

from twilio.twiml.voice_response import VoiceResponse

import os

def handle_call():

response = VoiceResponse()

response.say("Please leave a message after the beep.")

response.record(transcribe_callback='/handle-transcription')

return str(response)

For TTS applications, you can build voice agents or real-time transcription services.

7. Migrating and Version Updates

Migrating from older SDK versions (like v2 to v3) is well-documented, ensuring your code remains compatible with the latest features. For example:

# Migrate from v2 to v3

from deepgram import DeepgramClient

dg = DeepgramClient(DEEPGRAM_API_KEY)

# Use asyncio for handling tasks

8. Advanced TTS Features: Voice Customization and Latency Handling

Deepgram’s Aura voices allow you to customize the voice used for TTS. For instance, switching to Asteria in your request:

response = await deepgram.tts.synthesize("Hello, world", "aura-asteria-en", "wav")

With Deepgram’s Python SDK, you have all the tools to add cutting-edge voice and speech processing capabilities to your applications, whether it’s converting text to speech in real time, transcribing live audio streams, or integrating with AI voice platforms like OpenAI. The SDK’s flexibility in handling REST, WebSocket streaming, and advanced features like sentiment analysis make it a go-to solution for developers building voice-driven apps.

Make sure to set your API key as an environment variable and explore the use cases for AI-powered voice agents or real-time transcription in your next project!

For more information, visit the Deepgram Python SDK on GitHub

What is the easiest text-to-speech API?

PlayHT is one of the easiest text-to-speech APIs due to its simple interface, natural-sounding voices, and ultra-low latency. It supports real-time playback, making it ideal for live audio applications.

What is Deepgram API?

Deepgram API is a robust speech-to-text and text-to-speech API designed for real-time and pre-recorded audio transcription, powered by AI. It supports multiple audio formats, advanced features like metadata extraction, and customizable models for specific use cases in voice AI.

Is Google Text-to-Speech API free?

Google Text-to-Speech API offers a limited free tier but charges for higher usage based on characters processed. It is widely used for converting text into natural-sounding speech, offering various voice options.

How to install Deepgram SDK?

To install the Deepgram Python SDK, run pip install deepgram-sdk==3.* in your terminal. This installs the SDK along with its dependencies, allowing you to easily handle speech recognition, playback, and encoding for audio files.

Recent Posts

Listen & Rate TTS Voices

See Leaderboard

Top AI Apps

Alternatives

Similar articles