Microsoft Text to Speech: Now create Natural sounding Audio

Microsoft's Neural Voices, now on Play.ht

Anyone looking to create natural Text to Speech audio would want the best and most realistic natural sounding voices. As a result, we have integrated all of Microsoft text to speech voices in Play.ht so you can have access to the best voices and features from Microsoft, enabling you to create natural sounding audio.

Play.ht also leverages the best text to speech voices from Google Wavenet, Amazon Polly, and IBM Watson, but today, it’s all about the new Text to Speech voices from Microsoft.

Create your first audio using Microsoft’s voices here – Play.ht Dashboard

Some of the Standard and Neural Voices of Microsoft Text to Speech, on Play.ht
Some of the Standard, and Neural Voices offered by Microsoft, on Play.ht

Firstly, Microsoft offers a comprehensive family of Natural Text to Speech voices as part of their Azure cognitive services.

Secondly, the voices are powered by Microsoft’s Machine Learning algorithms which make them sound realistic, fluid and almost indistinguishable from real human voices.

Above all, these voices closely mimic the patterns and intonation of human voices. This makes them sound very natural, so they are useful in a plethora of applications.

Ranging from creating voice overs for videos, narrating blog posts, converting educational material into audio, etc.

Adding more than 140 voices and over 45 languages

In total, Microsoft provides 140 Text to Speech voices in over 45 languages. But these are not all natural sounding. Some of the voices do sound robotic but the new ones, known as Neural voices are the ones that sound extremely natural, and are of interest.

So what are Neural Voices?

Neural text-to-speech voices or Neural TTS, are created using a new type of speech synthesis powered by deep neural networks. Which means, the synthesized speech is nearly indistinguishable from human recordings.

More over, neural voices can be used to make interactions with chatbots and voice assistants more natural and engaging.

  • Convert digital texts such as e-books into audiobooks.
  • Enhance in-car navigation systems. 

With the human-like natural prosody and clear articulation of words, neural voices significantly reduce listening fatigue when users interact with AI systems.

New Neural Text to Speech vs The Standard Traditional Text to Speech

Here’s a small preview of all the neural voices saying “Thank you” in 49 different languages/locales.

List of Neural Voices.

This is a list of all the Microsoft Neural Text to Speech voices currently available, with their audio samples.

You can try any of these voices using our online AI Voice Generator.

LanguageGenderVoice nameSample Audio
Arabic (Egypt)Femalear-EG-SalmaNeural
Arabic (Saudi Arabia)Femalear-SA-ZariyahNeural
Catalan (Spain)Femaleca-ES-AlbaNeural
Danish (Denmark)Femaleda-DK-ChristelNeural
German (Germany)Femalede-DE-KatjaNeural
English (Australia)Femaleen-AU-NatashaNeural
English (Canada)Femaleen-CA-ClaraNeural
English (United Kingdom)Femaleen-GB-LibbyNeural
English (United Kingdom)Femaleen-GB-MiaNeural
English (India)Femaleen-IN-NeerjaNeural
English (United States)Femaleen-US-AriaNeural
English (United States)Maleen-US-GuyNeural
Spanish (Spain)Femalees-ES-ElviraNeural
Spanish (Mexico)Femalees-MX-DaliaNeural
Finnish (Finland)Femalefi-FI-NooraNeural
French (Canada)Femalefr-CA-SylvieNeural
French (France)Femalefr-FR-DeniseNeural
Hindi (India)Femalehi-IN-SwaraNeural
Italian (Italy)Femaleit-IT-ElsaNeural
Japanese (Japan)Femaleja-JP-NanamiNeural
Korean (Korea)Femaleko-KR-SunHiNeural
Norwegian, Bokmål (Norway)Femalenb-NO-IselinNeural
Dutch (Netherlands)Femalenl-NL-ColetteNeural
Polish (Poland)Femalepl-PL-ZofiaNeural
Portuguese (Brazil)Femalept-BR-FranciscaNeural
Portuguese (Portugal)Femalept-PT-FernandaNeural
Russian (Russia)Femaleru-RU-DariyaNeural
Swedish (Sweden)Femalesv-SE-HilleviNeural
Thai (Thailand)Femaleth-TH-AcharaNeural
Turkish (Turkey)Femaletr-TR-EmelNeural
Mandarin (Simplified Chinese, China)Femalezh-CN-XiaoxiaoNeural
Mandarin (Simplified Chinese, China)Femalezh-CN-XiaoyouNeural
Mandarin (Simplified Chinese, China)Malezh-CN-YunyangNeural
Mandarin (Simplified Chinese, China)Malezh-CN-YunyeNeural
Cantonese (Traditional Chinese, Hong Kong)Femalezh-HK-HiuGaaiNeural
Mandarin (Traditional Chinese, Taiwan)Femalezh-TW-HsiaoYuNeural

 

Microsoft’s Text to Speech — Change the style of your voice.

By default the Text-to-Speech synthesizes text using a neutral speaking style. However, with neural voices, you can adjust the speaking style to express different emotions.

Toggle emotions like cheerfulness, empathy, and calm, or optimize the voice for different scenarios like customer service, newscasting and voice assistant that fit your need.

en-US Jenny

With the English (US) new voice, Jenny, which is created with a friendly, warm and comforting voice persona focusing on conversational scenarios, Microsoft’s text to speech provides additional speaking styles including chatbot, and customer service.

You can hear the different speaking styles in Jenny’s voice below:

StyleStyle descriptionSample
GeneralExpresses a neutral tone
and available for general use
Valentino Lazaro scored a late winner for Austria to deny Northern Ireland a first Nations League point.
ChatExpresses a casual and relaxed
tone in conversation
Oh, well, that’s quite a change from California to Utah.
Customer service Expresses a friendly and helpful
tone for customer support
Okay, great.  In the meantime, see if you can reach out to Verizon and let them know your issue. And Randy should be calling you back shortly.

Similarly, new speaking style is also available for the en-US male voice, Guy.  Guy’s newscast style can be a great choice for a male voice that can read professional and news related content. 

zh-CN Xiaoxiao

In addition, 10 new speaking styles are available with zh-CN voice, Xiaoxiao. These new styles are optimized for audio content creators and intelligent bot developers to create more engaging interactive audios that express rich emotions.  

You can hear the new speaking styles in Xiaoxiao’s voice below:

Calm Affectionate Angry
那,那我再问你,
你之前有养过宠物嘛

 

老公,把灯打开好吗,
好黑呀,我很怕。

 

没想到,
我们八年的感情真的完了!

 

Disgruntled Fearful Gentle
这你都不明白吗?
真是个榆木脑袋。

 

先生,你没事吧?
要不要我叫医生过来?

 

我今天运气特别好,如果没有遇到您,
还不知道会怎么样呢!

 

Cheerful Serious Sad
太好了,
恭喜你顺利通过考核。

 

不要恋战,等待时机,
随时准备突围。

 

没想到,你居然是这么
一个无情无义的的人!

 

For the Chinese voice Xiaoxiao, the intensity (‘style degree’) of speaking style can be further adjusted to better fit your use case. You can specify a stronger or softer style with ‘style degree’ to make the speech more expressive or subdued.

Microsoft’s Text to Speech — In a Nutshell

In conclusion, Microsoft’s text to speech, with the new and improved neural voices, offers a wide range of options to transform your text’s into an audio which sounds as humane as possible. Combined with Play.ht, it takes your audio to a whole new level.

Play.ht offers not just a simple Text to Speech conversion, but a stage for your content to stand out from the crowd, and speak for itself.

Well, what are you waiting for?

Sign Up Now!