Anyone looking to create natural Text to Speech audio would want the best and most realistic natural sounding voices. As a result, we have integrated all of Microsoft text to speech voices in Play.ht so you can have access to the best voices and features from Microsoft, enabling you to create natural sounding audio.
Play.ht also leverages the best text to speech voices from Google Wavenet, Amazon Polly, and IBM Watson, but today, it’s all about the new Text to Speech voices from Microsoft.
Create your first audio using Microsoft’s voices here – Play.ht Dashboard
Firstly, Microsoft offers a comprehensive family of Natural Text to Speech voices as part of their Azure cognitive services.
Secondly, the voices are powered by Microsoft’s Machine Learning algorithms which make them sound realistic, fluid and almost indistinguishable from real human voices.
Above all, these voices closely mimic the patterns and intonation of human voices. This makes them sound very natural, so they are useful in a plethora of applications.
Ranging from creating voice overs for videos, narrating blog posts, converting educational material into audio, etc.
Adding more than 140 voices and over 45 languages
In total, Microsoft provides 140 Text to Speech voices in over 45 languages. But these are not all natural sounding. Some of the voices do sound robotic but the new ones, known as Neural voices are the ones that sound extremely natural, and are of interest.
So what are Neural Voices?
Neural text-to-speech voices or Neural TTS, are created using a new type of speech synthesis powered by deep neural networks. Which means, the synthesized speech is nearly indistinguishable from human recordings.
More over, neural voices can be used to make interactions with chatbots and voice assistants more natural and engaging.
- Convert digital texts such as e-books into audiobooks.
- Enhance in-car navigation systems.
With the human-like natural prosody and clear articulation of words, neural voices significantly reduce listening fatigue when users interact with AI systems.
Here’s a small preview of all the neural voices saying “Thank you” in 49 different languages/locales.
List of Neural Voices.
This is a list of all the Microsoft Neural Text to Speech voices currently available, with their audio samples.
You can try any of these voices using our online AI Voice Generator.
Language | Gender | Voice name | Sample Audio |
Arabic (Egypt) | Female | ar-EG-SalmaNeural | |
Arabic (Saudi Arabia) | Female | ar-SA-ZariyahNeural | |
Catalan (Spain) | Female | ca-ES-AlbaNeural | |
Danish (Denmark) | Female | da-DK-ChristelNeural | |
German (Germany) | Female | de-DE-KatjaNeural | |
English (Australia) | Female | en-AU-NatashaNeural | |
English (Canada) | Female | en-CA-ClaraNeural | |
English (United Kingdom) | Female | en-GB-LibbyNeural | |
English (United Kingdom) | Female | en-GB-MiaNeural | |
English (India) | Female | en-IN-NeerjaNeural | |
English (United States) | Female | en-US-AriaNeural | |
English (United States) | Male | en-US-GuyNeural | |
Spanish (Spain) | Female | es-ES-ElviraNeural | |
Spanish (Mexico) | Female | es-MX-DaliaNeural | |
Finnish (Finland) | Female | fi-FI-NooraNeural | |
French (Canada) | Female | fr-CA-SylvieNeural | |
French (France) | Female | fr-FR-DeniseNeural | |
Hindi (India) | Female | hi-IN-SwaraNeural | |
Italian (Italy) | Female | it-IT-ElsaNeural | |
Japanese (Japan) | Female | ja-JP-NanamiNeural | |
Korean (Korea) | Female | ko-KR-SunHiNeural | |
Norwegian, Bokmål (Norway) | Female | nb-NO-IselinNeural | |
Dutch (Netherlands) | Female | nl-NL-ColetteNeural | |
Polish (Poland) | Female | pl-PL-ZofiaNeural | |
Portuguese (Brazil) | Female | pt-BR-FranciscaNeural | |
Portuguese (Portugal) | Female | pt-PT-FernandaNeural | |
Russian (Russia) | Female | ru-RU-DariyaNeural | |
Swedish (Sweden) | Female | sv-SE-HilleviNeural | |
Thai (Thailand) | Female | th-TH-AcharaNeural | |
Turkish (Turkey) | Female | tr-TR-EmelNeural | |
Mandarin (Simplified Chinese, China) | Female | zh-CN-XiaoxiaoNeural | |
Mandarin (Simplified Chinese, China) | Female | zh-CN-XiaoyouNeural | |
Mandarin (Simplified Chinese, China) | Male | zh-CN-YunyangNeural | |
Mandarin (Simplified Chinese, China) | Male | zh-CN-YunyeNeural | |
Cantonese (Traditional Chinese, Hong Kong) | Female | zh-HK-HiuGaaiNeural | |
Mandarin (Traditional Chinese, Taiwan) | Female | zh-TW-HsiaoYuNeural |
Microsoft’s Text to Speech — Change the style of your voice.
By default the Text-to-Speech synthesizes text using a neutral speaking style. However, with neural voices, you can adjust the speaking style to express different emotions.
Toggle emotions like cheerfulness, empathy, and calm, or optimize the voice for different scenarios like customer service, newscasting and voice assistant that fit your need.
en-US Jenny
With the English (US) new voice, Jenny, which is created with a friendly, warm and comforting voice persona focusing on conversational scenarios, Microsoft’s text to speech provides additional speaking styles including chatbot, and customer service.
You can hear the different speaking styles in Jenny’s voice below:
Style | Style description | Sample |
General | Expresses a neutral tone and available for general use | Valentino Lazaro scored a late winner for Austria to deny Northern Ireland a first Nations League point. |
Chat | Expresses a casual and relaxed tone in conversation | Oh, well, that’s quite a change from California to Utah. |
Customer service | Expresses a friendly and helpful tone for customer support | Okay, great. In the meantime, see if you can reach out to Verizon and let them know your issue. And Randy should be calling you back shortly. |
Similarly, new speaking style is also available for the en-US male voice, Guy. Guy’s newscast style can be a great choice for a male voice that can read professional and news related content.
zh-CN Xiaoxiao
In addition, 10 new speaking styles are available with zh-CN voice, Xiaoxiao. These new styles are optimized for audio content creators and intelligent bot developers to create more engaging interactive audios that express rich emotions.
You can hear the new speaking styles in Xiaoxiao’s voice below:
Calm | Affectionate | Angry |
那,那我再问你, 你之前有养过宠物嘛
|
老公,把灯打开好吗, 好黑呀,我很怕。
|
没想到, 我们八年的感情真的完了!
|
Disgruntled | Fearful | Gentle |
这你都不明白吗? 真是个榆木脑袋。
|
先生,你没事吧? 要不要我叫医生过来?
|
我今天运气特别好,如果没有遇到您, 还不知道会怎么样呢!
|
Cheerful | Serious | Sad |
太好了, 恭喜你顺利通过考核。
|
不要恋战,等待时机, 随时准备突围。
|
没想到,你居然是这么 一个无情无义的的人!
|
For the Chinese voice Xiaoxiao, the intensity (‘style degree’) of speaking style can be further adjusted to better fit your use case. You can specify a stronger or softer style with ‘style degree’ to make the speech more expressive or subdued.
Microsoft’s Text to Speech — In a Nutshell
In conclusion, Microsoft’s text to speech, with the new and improved neural voices, offers a wide range of options to transform your text’s into an audio which sounds as humane as possible. Combined with Play.ht, it takes your audio to a whole new level.
Play.ht offers not just a simple Text to Speech conversion, but a stage for your content to stand out from the crowd, and speak for itself.
Well, what are you waiting for?
Sign Up Now!