Text2Speech Blog

NeoSpeech: Text-to-Speech Solutions.

What is a VTML tag?

VTML stands for Voice Text Markup Language and it’s a language specific to NeoSpeech’s text-to-speech software. VTML lets you modify how our text-to-speech voices read your text. You can edit the prosody (speech rhythms like speed and pitch) of our text-to-speech voices to make them sound more natural.

When we talk, we’re conveying more than just the literal meaning of the words we’re saying. We’re communicating another level of meaning with the way we speak. VTML lets you customize how our text-to-speech voices speak so you can communicate more effectively.

If you’ve worked with another markup language before, such as HTML, VTML will look similar. Both VTML and HTML are languages through which people can tell a software how to format, or in the case of text-to-speech, how to read a text prompt. If you’ve never used a markup language, we’ll break down what makes up a VTML tag.

Components of a VTML tag

Every VTML tag starts and ends with angle brackets (< >). These angle brackets let the software know that the text inside is a command for it to complete. The first angle bracket is always < and the last angle bracket is always >. Think of the angle brackets as hands that are holding the text inside together.

First, we start with an angle bracket.


Next, we let the software know which language we are using. In this case, VTML.


Note that when talking about VTML we capitalize it because we are using it as an acronym. When writing a tag, all text should be lowercase.

An underscore tells the software a command is coming next.


For a command, we’ll do speed.


Next, we have a space and specify a property of the command. In this case for speed, it will be value.

     <vtml_speed value

Then we choose a numeric value. For the speed property, you can choose between 50-400 with 100 being normal speed and the higher values being a faster speed.

     <vtml_speed value=”150”>

Now that you’ve written an opening tag, let’s look at when a command needs one or two tags.

Opening and closing tags

In the example above, we told our text-to-speech software to read at a faster speed, but how will our text-to-speech software know which words you want to be spoken quickly? That’s why the speed command has two tags, an opening tag and a closing tag.

We just went through what makes up an opening tag. The closing tag is the easy part as it lets our text-to-speech software know that you want it to stop applying the command. The closing tag is a backward slash followed by the name of the markup language being used and the command.

For speed, the closing tag looks like this:


Let’s see the speed tag in action. Say we have a sentence where we want the word “slow” to be read slowly. We place the opening tag before the word “slow” and the closing tag after the word “slow.”

     The mouse was <vtml_speed=”50”>slow</vtml_speed> compared to the snake.

The speed command has an opening tag and a closing tag so you can select the areas where you want the software to implement the command. If you’re using a command with an opening and closing tag, such as speed, pitch, and volume, don’t forget to include a closing tag or our text-to-speech software won’t execute the command.

How do you know when a command needs an opening and closing tag and when it doesn’t? Ask yourself, do you need to tell our text-to-speech software which words to apply the command to or does the command standalone? If you need to tell our text-to-speech software which words to execute the command on, like how we did with speed, then it is a command that needs an opening and closing tag. If you’re not sure, you can always check our VTML Manual.

Standalone tag

Let’s look at a command that is a single tag. For example, the break command tells our text-to-speech software to take a breath, so to speak, and is one tag. The break doesn’t change how a word is read, but determines how long the software waits before continuing to read.

Here’s the break tag.

     <vtml_break level=”3”/>

Notice how the backward slash is included at the end of the tag? The backward slash tells our text-to-speech software that the command is complete. You can’t use the break command as an opening and closing tag since it doesn’t make sense for a break to span across words.

VTML tags in VT Editor

If you’re using our VT Editor, this application makes it really easy to use VTML tags. Simply select options in the menu and VT Editor inserts the tags for you.

For instance, go to Edit. Choose Break. Select a value. VT Editor will insert the tag for you in blue.


For a command that has two tags, like the speed command, you will first have to select the text you want modified. VT Editor will input the opening tag at the beginning of the selection and the closing tag at the end of the selection.


How easy is that? Now you can modify the prosody of our text-to-speech voices to your liking!

Learn More about NeoSpeech’s Text-to-Speech

Want to learn more about all the ways Text-to-Speech can be used? Visit our Text-to-Speech Areas of Application page. And check out our Text-to-Speech Products page to find the right package for any device or application.

If you’re interested in integrating Text-to-Speech technology into your product, please fill out our short Sales Inquiry form and we’ll get you all the information and tools you need.

Related Articles

Check out NeoSpeech’s Interactive Online Text-to-Speech Demo!

NeoSpeech VT Editor Overview

What is Text-to-Speech and How Does It Work?

Follow us on LinkedInFacebookGoogle+, and Twitter!

No Comments

Post a Comment

Wordpress SEO Plugin by SEOPressor