Microsoft Text to Speech: Now create Natural sounding Audio

Microsoft's Neural Voices, now on

Anyone looking to create natural Text to Speech audio would want the best and most realistic natural sounding voices. As a result, we have integrated all of Microsoft text to speech voices in so you can have access to the best voices and features from Microsoft, enabling you to create natural sounding audio. also leverages the best text to speech voices from Google Wavenet, Amazon Polly, and IBM Watson, but today, it’s all about the new Text to Speech voices from Microsoft.

Create your first audio using Microsoft’s voices here – Dashboard

Some of the Standard and Neural Voices of Microsoft Text to Speech, on
Some of the Standard, and Neural Voices offered by Microsoft, on

Firstly, Microsoft offers a comprehensive family of Natural Text to Speech voices as part of their Azure cognitive services.

Secondly, the voices are powered by Microsoft’s Machine Learning algorithms which make them sound realistic, fluid and almost indistinguishable from real human voices.

Above all, these voices closely mimic the patterns and intonation of human voices. This makes them sound very natural, so they are useful in a plethora of applications.

Ranging from creating voice overs for videos, narrating blog posts, converting educational material into audio, etc.

Adding more than 140 voices and over 45 languages

In total, Microsoft provides 140 Text to Speech voices in over 45 languages. But these are not all natural sounding. Some of the voices do sound robotic but the new ones, known as Neural voices are the ones that sound extremely natural, and are of interest.

So what are Neural Voices?

Neural text-to-speech voices or Neural TTS, are created using a new type of speech synthesis powered by deep neural networks. Which means, the synthesized speech is nearly indistinguishable from human recordings.

More over, neural voices can be used to make interactions with chatbots and voice assistants more natural and engaging.

  • Convert digital texts such as e-books into audiobooks.
  • Enhance in-car navigation systems. 

With the human-like natural prosody and clear articulation of words, neural voices significantly reduce listening fatigue when users interact with AI systems.

New Neural Text to Speech vs The Standard Traditional Text to Speech

Here’s a small preview of all the neural voices saying “Thank you” in 49 different languages/locales.

List of Neural Voices.

This is a list of all the Microsoft Neural Text to Speech voices currently available, with their audio samples.

You can try any of these voices using our online AI Voice Generator.

LanguageGenderVoice nameSample Audio
Arabic (Egypt)Femalear-EG-SalmaNeural
Arabic (Saudi Arabia)Femalear-SA-ZariyahNeural
Catalan (Spain)Femaleca-ES-AlbaNeural
Danish (Denmark)Femaleda-DK-ChristelNeural
German (Germany)Femalede-DE-KatjaNeural
English (Australia)Femaleen-AU-NatashaNeural
English (Canada)Femaleen-CA-ClaraNeural
English (United Kingdom)Femaleen-GB-LibbyNeural
English (United Kingdom)Femaleen-GB-MiaNeural
English (India)Femaleen-IN-NeerjaNeural
English (United States)Femaleen-US-AriaNeural
English (United States)Maleen-US-GuyNeural
Spanish (Spain)Femalees-ES-ElviraNeural
Spanish (Mexico)Femalees-MX-DaliaNeural
Finnish (Finland)Femalefi-FI-NooraNeural
French (Canada)Femalefr-CA-SylvieNeural
French (France)Femalefr-FR-DeniseNeural
Hindi (India)Femalehi-IN-SwaraNeural
Italian (Italy)Femaleit-IT-ElsaNeural
Japanese (Japan)Femaleja-JP-NanamiNeural
Korean (Korea)Femaleko-KR-SunHiNeural
Norwegian, Bokmål (Norway)Femalenb-NO-IselinNeural
Dutch (Netherlands)Femalenl-NL-ColetteNeural
Polish (Poland)Femalepl-PL-ZofiaNeural
Portuguese (Brazil)Femalept-BR-FranciscaNeural
Portuguese (Portugal)Femalept-PT-FernandaNeural
Russian (Russia)Femaleru-RU-DariyaNeural
Swedish (Sweden)Femalesv-SE-HilleviNeural
Thai (Thailand)Femaleth-TH-AcharaNeural
Turkish (Turkey)Femaletr-TR-EmelNeural
Mandarin (Simplified Chinese, China)Femalezh-CN-XiaoxiaoNeural
Mandarin (Simplified Chinese, China)Femalezh-CN-XiaoyouNeural
Mandarin (Simplified Chinese, China)Malezh-CN-YunyangNeural
Mandarin (Simplified Chinese, China)Malezh-CN-YunyeNeural
Cantonese (Traditional Chinese, Hong Kong)Femalezh-HK-HiuGaaiNeural
Mandarin (Traditional Chinese, Taiwan)Femalezh-TW-HsiaoYuNeural


Microsoft’s Text to Speech — Change the style of your voice.

By default the Text-to-Speech synthesizes text using a neutral speaking style. However, with neural voices, you can adjust the speaking style to express different emotions.

Toggle emotions like cheerfulness, empathy, and calm, or optimize the voice for different scenarios like customer service, newscasting and voice assistant that fit your need.

en-US Jenny

With the English (US) new voice, Jenny, which is created with a friendly, warm and comforting voice persona focusing on conversational scenarios, Microsoft’s text to speech provides additional speaking styles including chatbot, and customer service.

You can hear the different speaking styles in Jenny’s voice below:

StyleStyle descriptionSample
GeneralExpresses a neutral tone
and available for general use
Valentino Lazaro scored a late winner for Austria to deny Northern Ireland a first Nations League point.
ChatExpresses a casual and relaxed
tone in conversation
Oh, well, that’s quite a change from California to Utah.
Customer service Expresses a friendly and helpful
tone for customer support
Okay, great.  In the meantime, see if you can reach out to Verizon and let them know your issue. And Randy should be calling you back shortly.

Similarly, new speaking style is also available for the en-US male voice, Guy.  Guy’s newscast style can be a great choice for a male voice that can read professional and news related content. 

zh-CN Xiaoxiao

In addition, 10 new speaking styles are available with zh-CN voice, Xiaoxiao. These new styles are optimized for audio content creators and intelligent bot developers to create more engaging interactive audios that express rich emotions.  

You can hear the new speaking styles in Xiaoxiao’s voice below:

Calm Affectionate Angry






Disgruntled Fearful Gentle






Cheerful Serious Sad






For the Chinese voice Xiaoxiao, the intensity (‘style degree’) of speaking style can be further adjusted to better fit your use case. You can specify a stronger or softer style with ‘style degree’ to make the speech more expressive or subdued.

Microsoft’s Text to Speech — In a Nutshell

In conclusion, Microsoft’s text to speech, with the new and improved neural voices, offers a wide range of options to transform your text’s into an audio which sounds as humane as possible. Combined with, it takes your audio to a whole new level. offers not just a simple Text to Speech conversion, but a stage for your content to stand out from the crowd, and speak for itself.

Well, what are you waiting for?

Sign Up Now!