The ElevenLabs Streaming API allows developers to convert text into high-quality speech in real-time, delivering low-latency audio streams for a wide range of applications like voice assistants, chatbots, and voice cloning tools. This guide walks you through how to integrate the API, optimize performance, and handle common challenges.
.mp3
or .wav
.Each request to the ElevenLabs API must include your API key in the header:
xi-api-key: your_api_key
content-type: application/json
Replace your_api_key
with your actual API key.
POST https://api.elevenlabs.io/v1/text-to-speech/stream
xi-api-key
: Your ElevenLabs API key.content-type
: application/json
Example payload:
{
"text": "Hello, welcome to the ElevenLabs API",
"voice_id": "21m00tcm4tlvdq8ikwam",
"model_id": "turbo_v2",
"voice_settings": {
"similarity_boost": 0.75,
"stability": 0.5
}
}
The response returns an audio stream that can be played in real time.
import requests
def stream_audio(text):
url = "https://api.elevenlabs.io/v1/text-to-speech/stream"
headers = {
"xi-api-key": "your_api_key",
"content-type": "application/json"
}
payload = {
"text": text,
"voice_id": "21m00tcm4tlvdq8ikwam",
"voice_settings": {"similarity_boost": 0.85}
}
response = requests.post(url, headers=headers, json=payload, stream=True)
with open("output.mp3", "wb") as audio_file:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
audio_file.write(chunk)
const socket = new WebSocket('wss://api.elevenlabs.io/v1/text-to-speech/stream');
socket.onopen = () => {
const payload = {
text: "Streaming real-time audio with ElevenLabs API.",
voice_id: "21m00tcm4tlvdq8ikwam",
model_id: "turbo_v2"
};
socket.send(JSON.stringify(payload));
};
socket.onmessage = (event) => {
// Play or process the audio stream here.
console.log("Audio received", event.data);
};
streaming_latency
parameter helps in optimizing real-time performance.21m00tcm4tlvdq8ikwam
is the default voice.You can test your API setup using ElevenLabs’ GitHub examples. Modify these to suit your needs.
Common errors and troubleshooting tips:
similarity_boost
to fine-tune voice output.Integrating ElevenLabs’ streaming text-to-speech service offers seamless, low-latency audio generation for any app or service. Whether you’re building an interactive assistant, a chatbot, or content production tools, the flexibility and performance of this API can help you create high-quality audio experiences in real time.
For more detailed examples, visit the ElevenLabs docs.