August 7, 2023

Introducing Cross-Language Voice Cloning while preserving Speaker Accent

Today we’re announcing a new feature that enables non-English speakers to clone their voices to create English speaking clones of their voice. The cloned voices retain the speaker’s original accent while speaking English. To use this feature, simply upload a few seconds of non-English speaking audio to the ‘Instant Cloning’ feature and that will create the clone.

Introduction

Cross-language Voice Cloning allows users to clone voices across different languages to English, retaining the nuances of the original accent and language. For instance, a fluent Spanish speaker can use PlayHT voice cloning services to upload a 30 second audio speaking Spanish. Our voice model then clones the voice and language, allowing the Spanish speaker to speak English. The model synthesizes speech while original speaker’s accent and speaking style.

The possibilities and use cases for this technology are vast, including dubbing, language learning, language localization, and more. The new model feature reaffirms our dedication to pushing the boundaries of what is possible with AI-generated voices.

‍

Multilingual Text-to-Speech Synthesis and Cross-Language Voice Cloning

Cross-language cloning has been attempted in the past but, before now, has required hours of fine-tuning very hard to source clean audio, transcription inputs, and manual hours to get satisfactory results.

It is possible to clone a voice without a transcript and a small amount of data using conventional TTS models like Tacatron. We always felt that the results could be better. That’s why our model doesn’t require large amounts of data and doesn’t need transcripts as the input representation. Yet the outcome is more than satisfactory.

Our Generative Voice model can capture and emulate the intonation and nuances of the original audio language to the cloned language without the need for interpretation. This allows for seamless cross-language cloning, making it a powerful tool for multilingual text-to-speech applications.

What’s next in Multilingual Synthesis and Cross-Language Cloning?

With Multilingual Synthesis and Cross-Language Cloning, we’ve reached a significant milestone in our AI voice cloning. With the ability to synthesize and clone voices in multiple languages, we are opening up new possibilities for businesses and individuals worldwide. Our market-based approach ensures that we are always working to meet the needs of our customers and the broader market, and we will continue to add new languages to our service as demand arises. To learn more about PlayHT and our AI voice cloning service, sign up for free today or connect with us on our socials to stay up-to-date on our latest developments. We’re truly excited to see what Cross-Language AI voices bring to content creation and are looking forward to seeing what you create!

Previous Announcements

April 4, 2025

PlayAI and Groq Join Forces to Transform Voice AI

PlayAI is partnering with Groq to deliver Dialog, our market-leading voice AI model, using fast AI inference from GroqCloud™. Click...

March 6, 2025

PlayAI and LiveKit partner to bring high-performance ultra-expressive voice AI to customers

March 6, 2025 We’re announcing a partnership between LiveKit and PlayAI to give developers the tools to build high-performance voice...

March 4, 2025

Introducing the All-New Play.ai Studio: Four Powerful New Features in One Unified Platform

We’re thrilled to announce a major upgrade to the Play.ai Studio, bringing together our most requested features and capabilities into...

February 3, 2025

PlayAI Dialog generally available; beats industry leading model 3 to 1 in human preference testing

February 3, 2025. PlayAI’s Dialog Text-to-Speech model is now in general availability, bringing multilingual capabilities, and exceptional performance to applications...

October 14, 2024

Introducing Play 3.0 mini – A lightweight, reliable and cost-efficient Multilingual Text-to-Speech model

Today we’re releasing our most capable and conversational voice model that can speak in 30+ languages using any voice or...

October 12, 2023

Introducing PlayHT 2.0 Turbo – The Fastest Generative AI Text-to-Speech API

TL;DR We are thrilled to announce the release of the FASTEST Voice LLM to date! Experience real-time speech streaming from...

August 9, 2023

Introducing PlayHT1.0: A Truly Realistic Text to Speech Model with Emotion and Laughter

Today we’re introducing the first ever Generative Text to Voice AI model that’s capable of synthesizing humanlike speech with incredible...

August 6, 2023

Introducing PlayHT2.0: The state-of-the-art Generative Voice AI Model for Conversational Speech

Today we’re introducing a new Generative Text-to-Voice AI Model that’s trained and built to generate conversational speech. This model also...

March 29, 2023

Play.ht hits GDC 2023: After Action Report

PlayHT at GDC 2023. A full recap. We believe that AI voices have a bright future in game development. With...