Introducing PlayHT 2.0 Turbo ⚡️ – The Fastest Generative AI Text-to-Speech API

Introducing PlayHT 2.0 Turbo ⚡️ – The Fastest Generative AI Text-to-Speech API

TL;DR

We are thrilled to announce the release of the FASTEST Voice LLM to date! Experience real-time speech streaming from text in 300ms or less. Dive in and test it using our Playground, available SDKs, or these Replit demos for both Nodejs and Python and a chatGPT integration.

Introduction

At PlayHT, our vision revolves around redefining human interactions with AI agents. Whether it’s for customer support or sales calls, AI tutors, or bringing Gaming NPCs to life, our goal is to revolutionize the way humans communicate with generative AI agents. 

And today we announce our latest milestone on the road to fulfilling that vision: the launch of PlayHT Turbo, a new version of our conversational voice model, PlayHT 2.0 that generates speech in under 300ms via network and < 100ms for on-premise solutions (soon).

Input Text Streaming

PlayHT 2.0 Turbo supports input text streaming. This feature seamlessly integrates with LLMs, like chatGPT. Simply feed the output stream of tokens/words from the LLM and the SDK will process the tokens in the best way that can balance both generating expressive contextual speech and reducing the TTFB (time to first byte).

Output Speech Streaming

Once Turbo receives text, it starts streaming audio in approximately 70ms. However, due to inevitable network costs, users typically receive the audio stream within a 200ms to 400ms window.

Check out our demo showcasing the integration with chatGPT with both input and output streaming:

Conversationalize Your Input

PlayHT 2.0 isn’t just any voice model. It was designed for conversations, and trained on over a million hours of conversational speech. This ensures almost any voice has an authentically human-like talking style. 

But wait, there’s more! We’re introducing an additional feature to elevate this experience; you can now pass any text to the model, and the model will try its best to modify the text input to make it sound more human-like, check these examples:

Prompt: “Hello, play support speaking? Please hold on a sec, Let me just pull up your details real quick. Can you tell me your account email or your phone number? Okay, there you are. So, what are you actually looking for in the upgrade? Any specific features or stuff that you’ve got your eye on?”

Without Conversationalize:

With Conversationalize:

Notice how the second generation has is more human-like and conversational. We are enabling this beta feature soon for all users, it will be configurable through the API.

A New Playground

We have built a playground where you can test the API and all its features from one interface without a need to write code. Here is a quick run through of all the main controls and functionalities of the playground:

Voice Cloning: Instantly cloning any voice or accent from a mere 30-second speech sample.

Model Selection: Choose between our High-Quality 2.0 model (latency < 1 second) or the Turbo model (300ms latency).

Voice Library: Select from an array of pre-built voices suitable for diverse use-cases.

Emotion & Style Guidance: Add an emotional layer such as Anger, Happiness, Sadness, etc. Adjust emotion intensity using the Style Guidance slider.

Output Format: Our models support multiple formats: mp3, wav, pcm, mulaw, flac, and ogg.

Temperature: Regulate variance. Lower temperatures yield predictable results, while higher ones introduce more variability.

Voice Guidance: Control voice uniqueness. Lower numbers make your voice sound more generic, while higher values amplify its distinctiveness.

New SDKs

We’re introducing two new SDKs for NodeJS and Python, making the integration of PlayHT 2.0 Turbo into your products a breeze:

– Nodejs SDK: Github Repository | Demo

– Python SDK: Github Repository | Demo

For those who don’t use Nodejs or Python, our HTTP API remains at your disposal. However, to experience the lowest latency, we recommend our SDKs, as they utilize the gRPC API.

Create Delightful Conversations

Ready to redefine Human-AI communication? Build the next AI Therapist, AI Tutor, Gaming NPCs, or Personal Assistants that actually sound human? We built this API for you, get started now for free, and join our discord and show us what you are building!

Previous Announcements

PlayAI and Groq Join Forces to Transform Voice AI

PlayAI is partnering with Groq to deliver Dialog, our market-leading voice AI model, using fast AI inference from GroqCloud™. Click...

Read More Arrow

PlayAI and LiveKit partner to bring high-performance ultra-expressive voice AI to customers

March 6, 2025 We’re announcing a partnership between LiveKit and PlayAI to give developers the tools to build high-performance voice...

Read More Arrow

Introducing the All-New Play.ai Studio: Four Powerful New Features in One Unified Platform

We’re thrilled to announce a major upgrade to the Play.ai Studio, bringing together our most requested features and capabilities into...

Read More Arrow

PlayAI Dialog generally available; beats industry leading model 3 to 1 in human preference testing

February 3, 2025.  PlayAI’s Dialog Text-to-Speech model is now in general availability, bringing multilingual capabilities, and exceptional performance to applications...

Read More Arrow

Introducing Play 3.0 mini – A lightweight, reliable and cost-efficient Multilingual Text-to-Speech model

Today we’re releasing our most capable and conversational voice model that can speak in 30+ languages using any voice or...

Read More Arrow

Introducing PlayHT1.0: A Truly Realistic Text to Speech Model with Emotion and Laughter

Today we’re introducing the first ever Generative Text to Voice AI model that’s capable of synthesizing humanlike speech with incredible...

Read More Arrow

Introducing Cross-Language Voice Cloning while preserving Speaker Accent

Today we’re announcing a new feature that enables non-English speakers to clone their voices to create English speaking clones of...

Read More Arrow

Introducing PlayHT2.0: The state-of-the-art Generative Voice AI Model for Conversational Speech

Today we’re introducing a new Generative Text-to-Voice AI Model that’s trained and built to generate conversational speech. This model also...

Read More Arrow

Play.ht hits GDC 2023: After Action Report

PlayHT at GDC 2023. A full recap. We believe that AI voices have a bright future in game development. With...

Read More Arrow