Text2Speech Blog

NeoSpeech: Text-to-Speech Solutions.

Top 5 Open Source Speech Recognition Toolkits

As we mentioned in our last blog post, the speech recognition market is forecasted to grow from about $3.7 billion a year to about $10 billion a year by 2022. Why? Because the technology has gotten better. Speech recognition engines have become more accurate in understanding what we are saying. It has become more useful, and developers are integrating speech recognition into their applications.

Speech recognition is half of the equation if you want to create an application that uses a natural language user interface, meaning it is controlled entirely by voice. Speech recognition (or speech-to-text) is what makes the app understand what is being said. Text-to-speech is how the app communicates back to the user.

If you’re a developer, you want this process to feel as natural as possible for your user. You want the text-to-speech voice to sound natural, realistic, and have the right voice/tone that fits your application. Head to the demo section of our website to listen to any of NeoSpeech’s voices to get a good idea.

As for the speech recognition side, you want it to be as accurate as possible. Nothing will drive away customers faster than a speech recognition engine that continually misinterprets what’s being said.

Developers know that building a speech recognition engine is an incredibly difficult task. But fear not, there are quiet a few speech recognition toolkits available today. These toolkits are meant to be the foundation to build a speech recognition engine on.

The best part is that there are several free ones that are very high quality. You just need to find the right one for you. If you’re looking for the best open source speech recognition toolkit, consider this as your resource page.

Kaldi

Kaldi speech recognition

This is one of the newer speech recognition tool kits, but it has made a name for itself fast! Development began in 2009 at a workshop at John Hopkins University called “Low Development Cost, High Quality Speech Recognition for New Languages and Domains”.

After working on the project for a couple of years, the code for Kaldi was released on May 14, 2011. Kaldi quickly gained a reputation for its ease to work with.

Daniel Povey, who was one of the original developers, still maintains and updates Kaldi, so don’t expect this toolkit to go stale anytime soon. Here are all the resources you’ll need for Kaldi:

CMUSphinx

CMUSphinx speech recognition

CMUSphinx, or called Sphinx for short, is actually a group of speech recognition systems developed by the Carnegie Mellon University.  There are several packages, each designed for different tasks and applications.

One of these includes Pocketsphinx, which is a version of sphinx that can be used in embedded systems. Take a look at the resources below for everything you need to know regarding Sphinx:

HTK

HTK speech recognition

Hidden Markov Model Toolkit (HTK) was made for handling HMMs. HMM is a statistical parametric synthesis technique. While HTK is mainly used for speech recognition, it can also be used for text-to-speech and for DNA sequencing.

HTK was developed at the Machine Learning Laboratory in the Cambridge University Engineering Department. Today, Microsoft has the copyright to the original HTK code. However, changes to the source code are encouraged by Microsoft.

New versions of HTK are released on a consistently, with the latest release in December 2015.

Simon

Simon speech recognition

Simon is a speech recognition toolkit that provides an easy-to-use user interface. The simple structure and friendly user-interface are some of Simon’s biggest strengths. Simon actually uses CMUSphinx, HTK, and Julius (mentioned below) as the foundation of their toolkit.

Simon is known as a popular speech recognition tool for Linux, although it can also work with Windows.

Julius

Julius speech recognition

Julius is a two-pass large vocabulary continuous speech recognition (LVCSR) engine. Born in 1997, Julius continues to be developed by the Interactive Speech Technology Consortium.

Currently, Japanese is the only language model that’s fully available with Julius. A sample English acoustic model is available, but cannot be used for commercial purposes. The VoxForge-Project is working on creating an English language acoustic model for Julius.

 

What do you think?

Have you used any of these toolkits? Did we miss any that should be listed. Which ones are your favorites? Let us know in the comments!

Learn More about NeoSpeech’s Text-to-Speech

To learn more about the different areas in which Text-to-Speech technology can be used, visit our Text-to-Speech Areas of Application page. And to learn more about the products we offer, visit our Text-to-Speech Products page.

If you’re interested in adding Text-to-Speech software to your application or would like to learn more about TTS, please fill out our Sales Inquiry form and one of our friendly team members will be happy to help.

Related Articles

Speech Market Projected To See Triple Growth Over Next 6 Years

The Impact Of Voice Search On SEO

HTS vs. USS: Which Speech Synthesis Technique is Better?

No Comments

Post a Comment

Wordpress SEO Plugin by SEOPressor