PlayAI Dialog generally available; beats industry leading model 3 to 1 in human preference testing

PlayAI Dialog generally available; beats industry leading model 3 to 1 in human preference testing

February 3, 2025.  PlayAI’s Dialog Text-to-Speech model is now in general availability, bringing multilingual capabilities, and exceptional performance to applications requiring emotive, human-like speech.  In recent third-party benchmark tests, Dialog was preferred by 10:1 vs. ElevenLabs v2.5 Turbo, and by over 3:1 vs. ElevenLabs Multilingual v2.0.

Play the video below to find out what it sounds like, or visit our AI voiceover Studio to try it for yourself.

PlayDialog has the most human sounding voices for business and narrations

Many applications for voice AI depend on low latency, which is why we tested Dialog against ElevenLabs’ v2.5 Turbo model. Both products have similar Time-to-First-Audio (TTFA), and are suitable for low latency applications like voice agents, contact centers, gaming and entertainment. Dialog’s fluid and emotionally coherent speech led people to prefer it to 10:1 over v2.5 Turbo, indicating that frontier voice AI models are solving the problem of balancing quality of output with speed of output.  

Comparing Dialog to ElevenLabs’ Multilingual v2.0 (which has longer latency and would be more suited to applications like dubbing), we tested 60 male and female voice generations using identical text with a panel of 100 respondents.  In these tests, Dialog was preferred 76% of the time, or over 3 to 1 vs. ElevenLabs.  

In both benchmarking analyses, respondents highlighted accurate expressiveness, and pacing as key reasons for the preference.  

Customers love it too:   “NextKast built a fully automated AI DJ for our radio station customers using PlayAI Dialog voices.  We love how expressive, emotional, and natural the voices sound, and didn’t find anything else close in the market.  In radio, keeping your audience engaged is the whole game, and Play’s voices do that” – Winston Potgieter, Founder, Axis Entertainment  


Figure 1:  Human preference comparison between PlayDialog and Elevenlabs Multilingual v2.0 across 60 samples.


Figure 2: Reason given for expressed preference.

We’re releasing the raw test data to the public if you want to learn more, and for each sample you can see the text prompt and hear the raw audio:

  • PlayAI Dialog vs. Elevenlabs Multilingual v2.0 – link
  • PlayAI Dialog vs. Elevenlabs v2.5 Turbo – link

Many thanks to our partners at Podonos, who conducted the independent testing.  Podonos is a third-party AI model evaluation service that uses human evaluation to assess the quality of AI models, including voice models.

PlayDialog is fast, too

Not only do Play AI’s voice models sound more human, but their efficient models have lower TTFA latency than most other models in the market today, opening up use cases like voice agents, call center software solutions, and in-game audio where low latency is essential.

PlayDialog is now multilingual

In addition to English, PlayDialog is now multilingual.  We’ve added support for Chinese, French, German, Hindi, Japanese, Korean, Portuguese and Urdu. 

An additional 23 languages are experimental:  Afrikaans, Arabic, Bengali, Bulgarian, Croatian, Czech, Danish, Dutch, Greek, Hebrew, Hungarian, Indonesian, Italian, Malay, Polish, Russian, Serbian, Swedish, Tagalog, Thai, Turkish, Ukrainian, and Xhosa.  

All these languages are available through our API and in our AI Voiceover Studio

We’re proud of what we’ve achieved

Building accurate, human-sounding voice AI models is not trivial.   The benchmarks above show how far we’ve come, but don’t take our word for it, try it on our AI Voiceover Studio tool, or sign up for a free API key and experiment with our low-latency API for yourself.

Previous Announcements

Introducing Play 3.0 mini – A lightweight, reliable and cost-efficient Multilingual Text-to-Speech model

Today we’re releasing our most capable and conversational voice model that can speak in 30+ languages using any voice or...

Read More Arrow

Introducing PlayHT 2.0 Turbo ⚡️ – The Fastest Generative AI Text-to-Speech API

TL;DR We are thrilled to announce the release of the FASTEST Voice LLM to date! Experience real-time speech streaming from...

Read More Arrow

Introducing PlayHT1.0: A Truly Realistic Text to Speech Model with Emotion and Laughter

Today we’re introducing the first ever Generative Text to Voice AI model that’s capable of synthesizing humanlike speech with incredible...

Read More Arrow

Introducing Cross-Language Voice Cloning while preserving Speaker Accent

Today we’re announcing a new feature that enables non-English speakers to clone their voices to create English speaking clones of...

Read More Arrow

Introducing PlayHT2.0: The state-of-the-art Generative Voice AI Model for Conversational Speech

Today we’re introducing a new Generative Text-to-Voice AI Model that’s trained and built to generate conversational speech. This model also...

Read More Arrow

Play.ht hits GDC 2023: After Action Report

PlayHT at GDC 2023. A full recap. We believe that AI voices have a bright future in game development. With...

Read More Arrow

Out With the Old, In with the New. Welcome to PlayHT!

Today, we’re announcing that we’re making a slight yet important change to our punctuation. We’re removing the full stop between...

Read More Arrow