loading . . . Azure Speech â Neural HD Text to Speech: Recent Voice Updates Azure Speech Neural Voices Update (March 2026) Neural HD 2.5 update to Latest in Production: Enhanced Quality, Styles, and Paralinguistic tags Neural HD 2.5 delivers notable enhancements to existing HD voices, with an emphasis on achieving more natural prosody, enhanced expressiveness, and increased consistencyâparticularly when processing lengthy or complex material. The update supports a range of speaking styles for English content and enables the integration of paralinguistic elements, contributing to more authentic conversational experiences. Enhanced style and metadata tags streamline the process of evaluating each voice's capabilities, facilitating the selection of the most appropriate options for applications such as virtual agents, narration, or expressive content creation. In addition to SSML input, Styles and Paralinguistics can now be applied using text input as well. Please refer to the examples below. Voice Test Results in English (US) Rating Female Male Microsoft Neural HD 3.99 3.94 Service A 3.75 3.99 Service B 3.66 3.67 Service C 3.59 3.89 A MOS evaluation was conducted across several domains, including Knowledge Sharing, Assistant, Customer Service, and Entertainment - with a panel of human judges. As detailed in the preceding table, Microsoft Neural HD received consistently high and balanced scores for both female and male voices, indicating dependable, high-quality performance across all domains. Whereas some alternatives demonstrate strengths within particular gender categories, Microsoft Neural HD provides a reliable and uniform listening experience, making it an appropriate choice for production use across varied real-world applications. List of supported styles: `amazed`, `amused`, `angry`, `annoyed`, `anxious`, `appreciative`, `calm`, `cautious`, `concerned`, `confident`, `confused`, `curious`, `defeated`, `defensive`, `defiant`, `determined`, `disappointed`, `disgusted`, `doubtful`, `ecstatic`, `encouraging`, `excited`, `fast`, `fearful`, `frustrated`, `happy`, `hesitant`, `hurt`, `impatient`, `impressed`, `intrigued`, `joking`, `laughing`, `optimistic`, `painful`, `panicked`, `panting`, `pleading`, `proud`, `quiet`, `reassuring`, `reflective`, `relieved`, `remorseful`, `resigned`, `sad`, `sarcastic`, `secretive`, `serious`, `shocked`, `shouting`, `shy`, `skeptical`, `slow`, `struggling`, `surprised`, `suspicious`, `sympathetic`, `terrified`, `upset`, `urgent`, `whispering` Note: Styles and Paralingsuitic are available on all HDLatestNeural voices, except âen-IN-Arjun:DragonHDLatestNeuralâ, âen-IN-Aarti:DragonHDLatestNeuralâ, and âen-IN-Meera:DragonHDLatestNeuralâ SSML samples SSML input (express-as tag) <speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='en-US'> <voice name='en-us-Ava:DragonHDLatestNeural'> Hello, Iâm Ava, Iâve just been updated, and now I can perform in many expressive styles, not only that, I can add paralinguistic elements to sound more natural, Iâll demo some for you, first, listen closely⌠<mstts:express-as style="whispering"> Donât tell anyone⌠itâs our secret, </mstts:express-as> <mstts:express-as style="breathing"> breathing </mstts:express-as> <mstts:express-as style="confident"> Now, letâs swing to the other extreme⌠</mstts:express-as> <mstts:express-as style="shouting"> ENOUGH, I WONâT BE SILENCED ANY LONGER! </mstts:express-as> <mstts:express-as style="throat_clearing"> throat clearing </mstts:express-as> <mstts:express-as style="fearful"> No⌠please⌠stay away, I canât handle this⌠</mstts:express-as> <mstts:express-as style="sighing"> sigh </mstts:express-as> <mstts:express-as style="ecstatic"> Yes! This is amazing! I feel alive, unstoppable! </mstts:express-as> <mstts:express-as style="laughter"> laughter </mstts:express-as> <mstts:express-as style="resigned"> Fine⌠whatever happens, happens, Iâll just let it go, </mstts:express-as> <mstts:express-as style="yawning"> yawn </mstts:express-as> <mstts:express-as style="confident"> And hereâs the best part â I donât just fade away, I can finish strong, full of energy, ready to bring words to life in any style you need, Iâm Ava, and this is only the beginning! </mstts:express-as> </voice> </speak> SSML input (using quotes â[]â) <speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/mstts' xml:lang='en-US'> <voice name='en-us-Ava:DragonHDLatestNeural'> Hello, Iâm Ava, Iâve just been updated, and now I can perform in many expressive styles, not only that, I can add paralinguistic elements to sound more natural, Iâll demo some for you, first, listen closely⌠[whispering] Donât tell anyone⌠itâs our secret, [breathing] [confident] Now, letâs swing to the other extreme⌠[shouting] ENOUGH, I WONâT BE SILENCED ANY LONGER! [throat_clearing] [fearful] No⌠please⌠stay away, I canât handle this⌠[sighing] [ecstatic] Yes! This is amazing! I feel alive, unstoppable! [laughter] [resigned] Fine⌠whatever happens, happens, Iâll just let it go, [yawning] [confident] And hereâs the best part â I donât just fade away, I can finish strong, full of energy, ready to bring words to life in any style you need, Iâm Ava, and this is only the beginning! </voice> </speak> Text input Hello, Iâm Ava, Iâve just been updated, and now I can perform in many expressive styles, not only that, I can add paralinguistic elements to sound more natural, Iâll demo some for you, first, listen closely⌠[whispering] Donât tell anyone⌠itâs our secret. [breathing] [confident] Now, letâs swing to the other extreme⌠[shouting] ENOUGH, I WONâT BE SILENCED ANY LONGER!! [throat_clearing] [panicked] No⌠please⌠stay away, I canât handle this⌠[sighing] [ecstatic] Yes! [ecstatic] This is amazing! [ecstatic] I feel alive, unstoppable! [laughter] [resigned] Fine⌠whatever happens, happens, Iâll just let it go. [yawning] [confident] And hereâs the best part â I donât just fade away, I can finish strong, full of energy, ready to bring words to life in any style you need, Iâm Ava, and this is just the beginning! List of supported paralinguistic tags: `laughter`, `coughing`, `throat_clearing`, `breathing`, `sighing`, `yawning` Neural HD Omni: Enhanced Quality, Styles, and Paralinguistic tags We also updating Neural HD Omni that we announced few weeks ago with overall quality and Styles, Paralinguistic tags support for all HD Omni voices. List of supported styles: `amazed`, `amused`, `angry`, `annoyed`, `anxious`, `appreciative`, `calm`, `cautious`, `concerned`, `confident`, `confused`, `curious`, `defeated`, `defensive`, `defiant`, `determined`, `disappointed`, `disgusted`, `doubtful`, `ecstatic`, `encouraging`, `excited`, `fast`, `fearful`, `frustrated`, `happy`, `hesitant`, `hurt`, `impatient`, `impressed`, `intrigued`, `joking`, `laughing`, `optimistic`, `painful`, `panicked`, `panting`, `pleading`, `proud`, `quiet`, `reassuring`, `reflective`, `relieved`, `remorseful`, `resigned`, `sad`, `sarcastic`, `secretive`, `serious`, `shocked`, `shouting`, `shy`, `skeptical`, `slow`, `struggling`, `surprised`, `suspicious`, `sympathetic`, `terrified`, `upset`, `urgent` SSML samples SSML input <speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='en-US'> <voice name='en-US-Brian:DragonHDOmniLatestNeural'> <mstts:paralinguistic type="throat_clearing"/> "Alright, let's get started." <mstts:express-as style="shouting"> "Welcome everyone! I'm really excited to show you what this voice can do!" </mstts:express-as> <mstts:paralinguistic type="laughter"/> "Okay, maybe I got a little carried away there." <mstts:express-as style="serious"> "But seriously, we are determined to make this demo absolutely amazing!" </mstts:express-as> </voice> </speak> Text input [throat_clearing] Alright, let's get started. [shouting] Welcome everyone! I'm really excited to show you what this voice can do! [laughter] Okay, maybe I got a little carried away there. [angry] But seriously, we are determined to make this demo absolutely amazing! List of supported paralinguistic tags: `laughter`, `coughing`, `throat_clearing`, `breathing`, `sighing`, `yawning` Neural HD Multi-Talker Voices: Expand for Language support and Speakers Neural HD Multi-Talker voices facilitate multi-speaker output within a unified voice family, thereby enhancing the efficiency of producing dynamic and immersive audio content without necessitating the management of numerous distinct voices. This capability is ideally suited for applications such as dialogue creation, podcast production, role-based narration, and storytelling scenarios where clear speaker distinction and seamless conversational flow are essential. Multi-Talker voices are specifically designed to maintain superior audio quality throughout speaker transitions, effectively minimizing the complexity often associated with coordinating outputs involving multiple voices. Previously, âen-US-MultiTalker-Ava-Andrew:DragonHDLatestNeuralâ and âen-US-MultiTalker-Ava-Steffan:DragonHDLatestNeuralâ were available in preview, featuring a fixed set of speakers and limited to en-US language support. The recent update broadens input text language compatibility beyond en-US to include fr-FR, es-ES, de-DE, it-IT, pt-BR, ko-KR, ja-JP, and zh-CN. Additionally, a newly introduced group of speakers is available under the model âen-MultiTalker-1:DragonHDLatestNeuralâ comprising: Gender Speaker name Female "Ada", "Ava", "Emma", "Jane" Male "Andrew", "Brian", "Davis", "Steffan" Sample SSML <speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" version="1.0" xml:lang="en-US"> <voice name="en-Multitalker-1:DragonHDLatestNeural"> <mstts:dialog> <mstts:turn speaker="emma">Andrew, before we get into todayâs chat, I have to askâdid you do anything fun over the weekend?</mstts:turn> <mstts:turn speaker="andrew">Actually, yes. I spent most of Saturday at a local market, no big plans, just wandering around and trying way too much street food.</mstts:turn> <mstts:turn speaker="emma">That already sounds like a perfect weekend. Markets have the best energy. What ended up being your favorite find?</mstts:turn> <mstts:turn speaker="andrew">The fresh pastries, without a doubt. Nothing fancy, just warm, simple, and really comforting, one of those moments where you slow down and just enjoy it.</mstts:turn> </mstts:dialog> </voice> </speak> Neural HD Flash Voices: Low-Latency version of HD Neural HD Flash introduces a new class of HD voices optimized for speed and responsiveness, particularly beneficial for scenarios where low latency is essential. We are introducing few more voices with primary locales as US English (en-US), they are also supporting bilingual with en-US and zh-CN. These HD Flash voices are engineered to deliver fast synthesis while maintaining the core Neural HD qualities of clear pronunciation and natural-sounding prosody. They are well-suited for use cases such as voice assistants, call center automation, and real-time speech-to-speech experiences, where responsiveness is crucial for user experience. With HD Flash, developers can now choose between maximizing expressiveness with Neural HD and Neural HD Omni, or prioritizing faster response times depending on their application's requirements. HD Flash Voices & Styles List Voice name Supported styles zh-CN-Xiaoxiao:DragonHDFlashLatestNeural angry, chat, cheerful, customer-service, excited, fearful, sad, voice-assistant zh-CN-Xiaoxiao2: DragonHDFlashLatestNeural affectionate, angry, anxious, cheerful, curious, disappointed, empathetic, encouraging, excited, fearful, guilty, lonely, poetry-reading, sad, sentimental, sorry, story, surprised, tired, whispering zh-CN-Xiaochen: DragonHDFlashLatestNeural cheerful, debating, empathetic, live-commercial, poetry-reading, sad, sorry zh-CN-Xiaoyi: DragonHDFlashLatestNeural angry, complaining, cute, gentle, nervous, sad, shy, strict zh-CN-Xiaoyu: DragonHDFlashLatestNeural angry, debating, cheerful, comforting, sad, sorry zh-CN-Xiaohan: DragonHDFlashLatestNeural affectionate, angry, cheerful, complaining, fearful, gentle, sad, shy, strict zh-CN-Xiaoshuang: DragonHDFlashLatestNeural chat zh-CN-Xiaoyou: DragonHDFlashLatestNeural chat, angry, cheerful, poetry-reading, sad, story, cute zh-CN-Yunxi: DragonHDFlashLatestNeural angry, chat, cheerful, complaining, depressed, fearful, news, sad, shy, strict, voice-assistant zh-CN-Yunyi: DragonHDFlashLatestNeural assassin, captain, cavalier, prince, game-narrator, geomancer, poet zh-CN-Yunxiao: DragonHDFlashLatestNeural zh-CN-Yunhan: DragonHDFlashLatestNeural angry, cheerful, curious, empathetic, encouraging, excited, guilty, lonely, sad, serious, sorry, whispering, surprised, tired zh-CN-Yunxia: DragonHDFlashLatestNeural affectionate, angry, cheerful, comforting, encouraging, excited, fearful, sad, surprised zh-CN-Yunye:DragonHDFlashLatestNeural en-US-Tiana:DragonHDFlashLatestNeural en-US-Tyler:DragonHDFlashLatestNeural en-US-Jimmie:DragonHDFlashLatestNeural Note: Styles support is per voice for HD Flash model Neural HD Regions Expansion Starting in March 2026, Neural HD voices will be rolling out to even more locations! Previously available in `East US`, `West Europe`, and `Southeast Asia`, these enhanced voices are now available on `West US 2`, `East US2`, `Central India`, `Canada Central`, `France Central`, and `Sweden Central`. Please refer to Supported Regions for Azure Speech - Foundry Tools | Microsoft Learn for latest information. Neural HD Pricing Update Starting from March 2026, the Neural HD voices will be offered at a new rate of $22 per 1 million characters, reduced from the previous price of $30 per 1 million characters. This adjustment provides a more accessible and economical option for users integrating Neural HD into their solutions. Please refer to Pricing - Azure Speech in Foundry Tools | Microsoft Azure for latest information. Getting Started with Neural HD Voices Begin exploring the latest Neural HD voices in Azure Speech to find the right mix of quality, performance, and expressiveness for your applications. As part of our ongoing commitment to advancing multilingual text-to-speech (TTS) technology, we strive to deliver adaptive voices that can seamlessly switch languages based on text input. These voices offer natural-sounding speech with precise pronunciation and prosody, making them invaluable for applications such as language learning, travel guidance, and international business communication. Microsoft's extensive portfolio features over 600 neural voices covering more than 150 languages and locales. These TTS voices enable rapid addition of read-aloud features for accessible app design or provide voices for chatbots to enhance conversational experiences. Through the Custom Neural Voice capability, businesses can also develop unique and distinctive brand voices with ease. With these innovations, we continue to push the boundaries of TTS technology, ensuring users have access to the most flexible and high-quality voices available. Additional Resources and Next Steps Try our demo to listen to existing neural voices Add text-to-speech to your apps today Apply for access to Custom Neural Voice Join Discord to collaborate and share feedback Contact us at [email protected] https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/azure-speech-neural-hd-text-to-speech-recent-voice-updates/ba-p/4505380