If you’re an engineer looking to integrate Deepgram’s text-to-speech (TTS) and speech-to-text (STT) capabilities in Python, you’re in the right place. Deepgram provides robust APIs and SDKs that enable seamless speech processing in real-time and pre-recorded scenarios.
This article will walk you through the steps for installing the Deepgram Python SDK, using both REST and streaming for TTS, and exploring other powerful features such as speech intelligence, integrations, and optimizations for latency and customization.
Before you can start working with Deepgram’s APIs, you’ll need to install the Python SDK. It’s super straightforward.
# Install the Deepgram Python SDK via pip
pip install deepgram-sdk==3.*
This SDK supports both REST and streaming operations, as well as advanced features like asyncio
for managing concurrent tasks. Now, let’s dive into TTS!
Deepgram allows you to transform text into human-like speech. You can do this via REST for quick requests or WebSocket streaming for real-time applications. Here’s how you can implement both:
import os
from deepgram import Deepgram
# Initialize Deepgram client using the API key
DEEPGRAM_API_KEY = os.getenv('DEEPGRAM_API_KEY')
deepgram = Deepgram(DEEPGRAM_API_KEY)
# Prepare the text to be converted
text = "Hello, this is a text-to-speech example using Deepgram."
# Define request headers
headers = {'Content-Type': 'application/json'}
# Make a REST API request
async def convert_text_to_speech():
response = await deepgram.tts.synthesize(text, 'en-US', 'wav')
with open('output.wav', 'wb') as f:
f.write(response['audio_data'])
print("Audio file saved.")
# Run the async function
import asyncio
asyncio.run(convert_text_to_speech())
For real-time use cases (like AI-powered voice agents or live narration), you’ll want to stream the TTS results.
from deepgram import Deepgram
import asyncio
# Initialize Deepgram client
DEEPGRAM_API_KEY = os.getenv('DEEPGRAM_API_KEY')
dg_client = Deepgram(DEEPGRAM_API_KEY)
# Create a websocket connection and stream TTS output
async def stream_tts():
ws = await dg_client.speak.websocket('1')
async def on_binary_data(data):
with open('live_output.wav', 'ab') as f:
f.write(data)
ws.on('binary_data', on_binary_data)
await ws.speak('Hello, this is a real-time TTS example using Deepgram.')
# Run the async function
asyncio.run(stream_tts())
The WebSocket API lets you interact with the Flush
and Clear
control messages for managing text buffers and ensuring low-latency output.
The SDK provides both Threaded and Async/Await clients, giving you flexibility in how you handle I/O-bound tasks. The Threaded
client is straightforward for quick operations, while Async/Await
is useful when dealing with real-time streaming or when you need to handle multiple requests concurrently.
Example of using asyncio
to handle WebSocket data:
async def async_ws_example():
dg = Deepgram(DEEPGRAM_API_KEY)
ws = await dg.speak.websocket('1')
async def on_message(data):
print("Message received: ", data)
ws.on('message', on_message)
await ws.speak("Deepgram makes AI sound human.")
asyncio.run(async_ws_example())
Deepgram’s Python SDK also excels in converting both live audio and pre-recorded audio files into text.
import os
from deepgram import Deepgram
# Initialize Deepgram client
DEEPGRAM_API_KEY = os.getenv('DEEPGRAM_API_KEY')
deepgram = Deepgram(DEEPGRAM_API_KEY)
async def transcribe_audio():
with open('audio.wav', 'rb') as audio_file:
source = {'buffer': audio_file, 'mimetype': 'audio/wav'}
response = await deepgram.transcription.pre_recorded(source, {'punctuate': True})
print(response['results']['channels'][0]['alternatives'][0]['transcript'])
# Run the async function
import asyncio
asyncio.run(transcribe_audio())
import asyncio
from deepgram import Deepgram
DEEPGRAM_API_KEY = os.getenv('DEEPGRAM_API_KEY')
deepgram = Deepgram(DEEPGRAM_API_KEY)
async def live_transcription():
ws = await deepgram.transcription.live({'punctuate': True})
async def on_transcript(data):
print(data)
ws.on('transcript', on_transcript)
await ws.send_audio_from_file('live_audio.wav')
# Run the async function
asyncio.run(live_transcription())
The SDK offers additional features like:
Example of optimizing TTS for real-time streaming:
async def optimized_stream_tts():
ws = await dg_client.speak.websocket('1')
await ws.speak('This is optimized for real-time TTS.', flush=True)
await ws.speak('Hello, world!', clear=True)
Deepgram integrates seamlessly with popular platforms like Twilio, Zoom, and AWS. Example:
# Integrating Twilio with Deepgram for speech-to-text transcription
from twilio.twiml.voice_response import VoiceResponse
import os
def handle_call():
response = VoiceResponse()
response.say("Please leave a message after the beep.")
response.record(transcribe_callback='/handle-transcription')
return str(response)
For TTS applications, you can build voice agents or real-time transcription services.
Migrating from older SDK versions (like v2 to v3) is well-documented, ensuring your code remains compatible with the latest features. For example:
# Migrate from v2 to v3
from deepgram import DeepgramClient
dg = DeepgramClient(DEEPGRAM_API_KEY)
# Use asyncio for handling tasks
Deepgram’s Aura voices allow you to customize the voice used for TTS. For instance, switching to Asteria in your request:
response = await deepgram.tts.synthesize("Hello, world", "aura-asteria-en", "wav")
With Deepgram’s Python SDK, you have all the tools to add cutting-edge voice and speech processing capabilities to your applications, whether it’s converting text to speech in real time, transcribing live audio streams, or integrating with AI voice platforms like OpenAI. The SDK’s flexibility in handling REST, WebSocket streaming, and advanced features like sentiment analysis make it a go-to solution for developers building voice-driven apps.
Make sure to set your API key as an environment variable and explore the use cases for AI-powered voice agents or real-time transcription in your next project!
For more information, visit the Deepgram Python SDK on GitHub
PlayHT is one of the easiest text-to-speech APIs due to its simple interface, natural-sounding voices, and ultra-low latency. It supports real-time playback, making it ideal for live audio applications.
Deepgram API is a robust speech-to-text and text-to-speech API designed for real-time and pre-recorded audio transcription, powered by AI. It supports multiple audio formats, advanced features like metadata extraction, and customizable models for specific use cases in voice AI.
Google Text-to-Speech API offers a limited free tier but charges for higher usage based on characters processed. It is widely used for converting text into natural-sounding speech, offering various voice options.
To install the Deepgram Python SDK, run pip install deepgram-sdk==3.*
in your terminal. This installs the SDK along with its dependencies, allowing you to easily handle speech recognition, playback, and encoding for audio files.