Today we’re releasing our most capable and conversational voice model that can speak in 30+ languages using any voice or...
February 3, 2025
February 3, 2025. PlayAI’s Dialog Text-to-Speech model is now in general availability, bringing multilingual capabilities, and exceptional performance to applications requiring emotive, human-like speech. In recent third-party benchmark tests, Dialog was preferred by 10:1 vs. ElevenLabs v2.5 Turbo, and by over 3:1 vs. ElevenLabs Multilingual v2.0.
Play the video below to find out what it sounds like, or visit our AI voiceover Studio to try it for yourself.
Many applications for voice AI depend on low latency, which is why we tested Dialog against ElevenLabs’ v2.5 Turbo model. Both products have similar Time-to-First-Audio (TTFA), and are suitable for low latency applications like voice agents, contact centers, gaming and entertainment. Dialog’s fluid and emotionally coherent speech led people to prefer it to 10:1 over v2.5 Turbo, indicating that frontier voice AI models are solving the problem of balancing quality of output with speed of output.
Comparing Dialog to ElevenLabs’ Multilingual v2.0 (which has longer latency and would be more suited to applications like dubbing), we tested 60 male and female voice generations using identical text with a panel of 100 respondents. In these tests, Dialog was preferred 76% of the time, or over 3 to 1 vs. ElevenLabs.
In both benchmarking analyses, respondents highlighted accurate expressiveness, and pacing as key reasons for the preference.
Customers love it too: “NextKast built a fully automated AI DJ for our radio station customers using PlayAI Dialog voices. We love how expressive, emotional, and natural the voices sound, and didn’t find anything else close in the market. In radio, keeping your audience engaged is the whole game, and Play’s voices do that” – Winston Potgieter, Founder, Axis Entertainment
Figure 1: Human preference comparison between PlayDialog and Elevenlabs Multilingual v2.0 across 60 samples.
We’re releasing the raw test data to the public if you want to learn more, and for each sample you can see the text prompt and hear the raw audio:
Many thanks to our partners at Podonos, who conducted the independent testing. Podonos is a third-party AI model evaluation service that uses human evaluation to assess the quality of AI models, including voice models.
Not only do Play AI’s voice models sound more human, but their efficient models have lower TTFA latency than most other models in the market today, opening up use cases like voice agents, call center software solutions, and in-game audio where low latency is essential.
In addition to English, PlayDialog is now multilingual. We’ve added support for Chinese, French, German, Hindi, Japanese, Korean, Portuguese and Urdu.
An additional 23 languages are experimental: Afrikaans, Arabic, Bengali, Bulgarian, Croatian, Czech, Danish, Dutch, Greek, Hebrew, Hungarian, Indonesian, Italian, Malay, Polish, Russian, Serbian, Swedish, Tagalog, Thai, Turkish, Ukrainian, and Xhosa.
All these languages are available through our API and in our AI Voiceover Studio
Building accurate, human-sounding voice AI models is not trivial. The benchmarks above show how far we’ve come, but don’t take our word for it, try it on our AI Voiceover Studio tool, or sign up for a free API key and experiment with our low-latency API for yourself.
October 14, 2024
Today we’re releasing our most capable and conversational voice model that can speak in 30+ languages using any voice or...
October 12, 2023
TL;DR We are thrilled to announce the release of the FASTEST Voice LLM to date! Experience real-time speech streaming from...
August 9, 2023
Today we’re introducing the first ever Generative Text to Voice AI model that’s capable of synthesizing humanlike speech with incredible...
August 7, 2023
Today we’re announcing a new feature that enables non-English speakers to clone their voices to create English speaking clones of...
August 6, 2023
Today we’re introducing a new Generative Text-to-Voice AI Model that’s trained and built to generate conversational speech. This model also...
March 29, 2023
PlayHT at GDC 2023. A full recap. We believe that AI voices have a bright future in game development. With...
June 12, 2020
Today, we’re announcing that we’re making a slight yet important change to our punctuation. We’re removing the full stop between...