Play.ht Launches Multilingual Synthesis and Cross-Language Voice Cloning

By Hammad Syed in TTS

April 27, 2023 5 min read

Generate AI Voices, Indistinguishable from Humans

Play.ht, the leading provider of artificially generated voices, in announcing the launch of its latest machine learning model that supports multilingual synthesis and cross-language voice cloning.

This groundbreaking technology allows users to clone voices across different languages to English, retaining the nuances of the original accent and language. For instance, a fluent Spanish speaker can use Play.ht’s voice cloning services to upload a 30-minute audio speaking Spanish. The model then clones the voice and language, allowing the Spanish speaker to speak English with Play.ht’s TTS software. The software reads out the text in the initial audio’s voice but in English, encapsulating the Spanish accent and nuances.

Here are few examples of supported languages

English
British English
German
Spanish
French
Italian
Japanese
Korean
Portuguese
Turkish
Russian
Hindi

The possibilities and use cases for this technology are vast, including dubbing, language learning, language localization, and more. With a global team across multiple continents, Play.ht is committed to diversity and innovation. The new model reaffirms our dedication to pushing the boundaries of what is possible with AI-generated voices.

Discovering Multilingual Text-to-Speech Synthesis and Cross-Language Voice Cloning

Cross-language cloning has been attempted in the past but, before now, has required hours of fine-tuning very hard to source clean audio, transcription inputs, and manual hours to get satisfactory results.

It is possible to clone a voice without a transcript and a small amount of data using conventional TTS models like Tacatron. We always felt that the results could be better. That’s why our model doesn’t require large amounts of data and doesn’t need transcripts as the input representation. Yet the outcome is more than satisfactory. Listen to the results below:

Our new latest text-to-speech model that supports cross-language cloning, Parrot, was designed to be an improved version of Peregrine with enhanced pitch, motion, and pause control, as well as zero-shot cloning capabilities.

During the development process, we discovered an exciting feature of our generative model. Parrot can capture and emulate the intonation and nuances of the original audio language to the cloned language without the need for interpretation. This allows for seamless cross-language cloning, making Parrot a powerful tool for multilingual text-to-speech applications.

How to Utilize Cross-Language Cloning

The question on everyone’s mind is how to utilize AI voice cloning. We’ve made the process as simple as possible.

Sign up for a Play.ht account and head to the Voice Cloning section on the left sidebar.
Submit your speaking audio (using any language) in the app with either our high-fidelity or our instant clone options. Depending on your selection, cloning can take a couple of hours or be completed instantly.
Next, open up our Ultra-Realistic editor by clicking Create Audio back on the dashboard.
Select your recently cloned voice in the voice selection pop-up.
Finally, enter in some text in our Rich-Text editor and you’ll be using your cloned voice in a couple of seconds.

AI Voice Cloning Service: Breaking Down Language Barriers

An AI voice cloning service has the potential to revolutionize the way content creators, artists, and educators communicate with their audiences. With this technology, creators can clone their voices and create voiceovers in different languages, breaking down language barriers and reaching wider audiences.

Cross-Language Cloning and the Creator Economy

The dominance of English as the primary language of communication has left many innovative and exciting content creators unable to share their work with the rest of the world due to language barriers. However, with AI voice cloning, creators can easily create content in multiple languages and reach a global audience.

Cross-Language Cloning and the Arts

In the arts, dubbing has long been a contentious topic. However, with AI voice cloning, actors’ original voices can be cloned in their native language and then cross-cloned to create audio in different languages. Dubbed movies could be entirely powered by synthetic voices.

Cross-Language Cloning and Education

In the field of education, an AI voice cloning service could be used to expand the use case of the technology into EdTech. Many university programs worldwide offer courses taught in English even though English isn’t the country’s official language (or the lingua franca). Lecturers and professors with limited English proficiency could record their content in their native language and then clone their voice to use speech synthesis to speak English, making it easier to teach courses in English to students who may not speak the language fluently.

Overall, an AI voice cloning service has the potential to break down language barriers and enable creators, artists, and educators to share their work with a global audience.

What’s next in Multilingual Synthesis and Cross-Language Cloning?

With Multilingual Synthesis and Cross-Language Cloning, we’ve reached a significant milestone in our AI voice cloning. With the ability to synthesize and clone voices in multiple languages, we are opening up new possibilities for businesses and individuals worldwide. Our market-based approach ensures that we are always working to meet the needs of our customers and the broader market, and we will continue to add new languages to our service as demand arises. To learn more about Play.ht and our AI voice cloning service, sign up for free today or connect with us on our socials to stay up-to-date on our latest developments. We’re truly excited to see what Cross-Languag