Text2Speech Blog

NeoSpeech: Text-to-Speech Solutions.

Fine Tune Text-to-Speech Pronunciation

Whether you’re in the education, healthcare, or transportation industries, it is important for your text-to-speech software to sound natural and pronounce words correctly. A text-to-speech engine may not know how to say industry specific terms or molecular names. For that reason, a great text-to-speech engine must have customizable capability. Our text-to-speech engines provide this capability. You can modify the pronunciation of our text-to-speech voices with VTML tags and the User Dictionary. In this article, we’ll explain how to use our part of speech and phoneme VTML tags.

Fine Tune Text-to-Speech Pronunciation

Part of Speech

There are some words that are spelled the same, but pronounced differently and have different meanings. How is a text-to-speech engine supposed to know when you’re using record as a verb or a noun? Our text-to-speech engines are pretty smart. While our text-to-speech engines won’t understand what your text means (they’re not AIs), they do understand context. That’s why Julie will pronounce “record” correctly according to its use in a full sentence.

Here’s an example sentence: Did you record the guitar on this record?

We didn’t have to use any VTML tags, because Julie already understood that “record” was being used first as a verb and then as a noun.

In a case where our text-to-speech voices don’t have enough context to determine how to pronounce the word, we could use the partofsp VTML tag, which stands for “part of speech.” The part of speech tag is an easy way to change how a word is pronounced based on whether it is a noun, verb, adjective, and so on, without having to look at phonetic symbol charts.

For our example sentence, here’s how we would use the part of speech tag:

     Did you <vtml_partofsp part="verb">record</vtml_partofsp>  the guitar on this <vtml_partofsp part="noun">record</vtml_partofsp>?

The part of speech values you can use are: unknown, noun, verb, modifier, function, or interjection.

What do you do if the two words are the same part of speech? One example is the word “bass.” Since both the musical instrument and the fish are nouns, we can’t use the part of speech tag to determine pronunciation. That’s where our phoneme VTML tag comes in.


The phoneme VTML tag gives you full control over pronunciation. A phoneme is the smallest unit of speech. As long as the phoneme is recorded in the text-to-speech engine’s vocabulary, you can use it.

For the conundrum we ran into with the word “bass,” here’s how we can use the phoneme tag to modify pronunciation.

     He caught a <vtml_phoneme alphabet="x-cmu" ph="B AE1 S">bass</vtml_phoneme>.


There are five phoneme alphabets for our English text-to-speech voices, which are IPA, Extended SAMPA, Worldbet, CMUdict, and SAPI Phoneme Representation. Learn how to use each phoneme alphabet in our “Phonetic Transcription Resources for Speech Technology” guide.

Our phoneme tags are useful in determining the pronunciation of industry specific jargon, chemical names, and medications.

To learn more about all of our VTML tags and how to use them, check out our VTML manual.

Learn More about NeoSpeech’s Text-to-Speech

Want to learn more about all the ways Text-to-Speech can be used? Visit our Text-to-Speech Areas of Application page. And check out our Text-to-Speech Products page to find the right package for any device or application.

If you’re interested in integrating Text-to-Speech technology into your product, please fill out our short Sales Inquiry form and we’ll get you all the information and tools you need.

Related Articles

New Case Study on eLearning Accessibility

Text-to-Speech for Commercial Use

Top 5 VTML Tags Infographic

Follow us on LinkedInFacebookGoogle+, and Twitter!

No Comments

Post a Comment

Wordpress SEO Plugin by SEOPressor