Whether you’re doing research in speech technology or adding a text-to-speech function in an application, having a consistent method of representing sounds is important. If you’re not a linguist, it can be overwhelming to look at the many different phonetic representation methods and their charts.

We put together this guide to explain five phonetic representation methods including IPA, Extended SAMPA, Worldbet, SAPI Phoneme Representation, and CMUdict. For some of these methods, we’ve included links to phonetic converters and charts. You can use all of these phonetic representation methods with our phoneme VTML tag to fine tune the pronunciation of our text-to-speech voices.

All of the phonetic representation methods we’ll go over in this article can be used with our English text-to-speech voices. You can use j-tag with our Japanese text-to-speech voices, Pinyin with our Chinese and Taiwanese text-to-speech voices, and Jyutping with our Cantonese text-to-speech voices. Check your VTML manual for language specific phoneme tag uses.

5 Phoneme Representation Method VTML Tags

1. International Phonetic Alphabet (IPA)

In 1866, the International Phonetic Association was established in Paris. This association created the International Phonetic Alphabet (IPA) with the goal of making a standard system for the phonetic descriptions of languages. Many phonetic transcription methods are based off of IPA.

One of the challenges of using IPA in electronic communications is that a computer cannot read the symbols and the symbols can be rendered differently across devices. Nowadays, to easily use IPA symbols on the computer, we can use an IPA typing tool or an online converter. If you copy the IPA symbols to another text processer, make sure to use a Unicode font, such as Arial, so the IPA symbols display correctly.

To use IPA with the phoneme VTML tag, use the decimal number of the IPA symbol separated by semicolons.

For example, let’s have Julie pronounce “tomato” with a UK accent.

     <vtml_phoneme alphabet="ipa" ph="116;601;712;109;230;116;111;650;">tomato</vtml_phoneme>


2. Extended Speech Assessment Methods Phonetic Alphabet (Extended SAMPA)

With the rise of computers, linguists and speech technology researchers needed a phonetic representation method that computers could understand. While IPA was a standard phonetic system, it was only seen as graphical symbols by computers.

The Speech Assessment Methods Phonetic Alphabet (SAMPA) was created in the late 1980s under the European Strategic Program on Research in Information Technology (ESPRIT). SAMPA is an ASCII machine-readable phonetic alphabet based off of IPA. ASCII stands for American Standard Code for Information Interchange, a character encoding standard for electronic communication created in the 1960’s. SAMPA focused on European languages such as French, German, and Italian. Extended SAMPA was designed to include all IPA symbols and to cover more languages. You can use an Extended SAMPA and IPA online converter.

To use Extended SAMPA, here’s what the phoneme VTML tag looks like:

     <vtml_phoneme alphabet="x-sampa" ph="t@'meit@U">tomato</vtml_phoneme>

3. Worldbet

Like Extended SAMPA, Worldbet is an ASCII machine-readable language based off of IPA. Worldbet was designed for African, Asian, European, and Indian languages. The goal was to have unique symbols for each sound. Having one phonetic representation system with unique symbols for each sound would make it easier to study multiple languages.

When using Worldbet, here’s what the phoneme VTML tag looks like:

     <vtml_phoneme alphabet="x-worldbet" ph="t&'meitoU">tomato</vtml_phoneme>

4. Carnegie Mellon University Pronouncing Dictionary (CMUdict)

Carnegie Mellon University (CMU) is a leader in computer science research and provides many resources for speech technology. One of our founders studied at CMU. The Speech Group at CMU created the CMU Pronouncing Dictionary (CMUdict) for speech recognition research. The CMUdict online translator shows the pronunciation of American English words.

CMUdict gives “T AH0 M EY1 T OW0” as the phonetic representation of tomato, but we can edit it to give James a UK accent.

     <vtml_phoneme alphabet="x-cmu" ph="T AH0 M AE1 T OW0">tomato</vtml_phoneme>


5. SAPI Phoneme Representation

Microsoft’s SAPI Phoneme Representation is designed to be an easy phonetic representation method for application developers to use. SAPI Phoneme Representation is not meant for fine tune control of pronunciation or for linguistic study. Microsoft provides a simple chart for American English pronunciations.

Here’s what it looks like with the phoneme VTML tag.

     <vtml_phoneme alphabet="x-sapi" ph="h eh – l ow 1">hello</vtml_phoneme>

More Resources

With all of these phonetic representation methods, how do you choose? Each phonetic representation method has advantages and disadvantages, so it depends on what your project is. Extended SAMPA and Worldbet are the most popular phonetic representation methods for use with speech technology and electronic communication. Extended SAMPA and Worldbet also cover a range of languages. However, if you’re only working with American English, CMUdict would be sufficient. Or if you’re using SAPI text-to-speech voices, keep it simple with the SAPI Phoneme Representation method.

For a full chart comparing all five phonetic representation methods, look at Appendix A of our VTML manual.

If you’re interested in learning more about phonetic representation methods, check out these academic papers.

Speech Assessment Methods Phonetic Alphabet (SAMPA): Analysis of Urdu” by Hasan Kabir and Abdul Mannan Saleem

Computer-coding the IPA: a proposed extension of SAMPA” by J. C. Wells

ASCII Phonetic Symbols for the World’s Languages: Worldbet” by James L. Hieronymus

