NeoSpeech’s Text-to-Speech Server SDK Overview
Want to build a speech-enabled application that can service multiple requests at once from anywhere in the world?
This blog is part 2 in a 7-part blog series highlighting each of NeoSpeech’s text-to-speech solutions.
Building your own text-to-speech solution or integrating text-to-speech into your existing product shouldn’t be difficult.
With NeoSpeech, it’s actually pretty easy. We send you the complete TTS engine optimized to your specifications with the specific SDKs and APIs you need so you can integrate it into your custom product or application.
Which NeoSpeech product you need to make that happen depends on what you’re trying to build and how you need to use text-to-speech. Here are all of NeoSpeech’s solutions:
- VoiceText TTS Engine SDK
- VoiceText TTS Server SDK
- VoiceText Embedded SDK
- VoiceText Editor (VT Editor)
- VoiceText SAPI
- Web Service (cloud-based solution)
- TTS On Demand (cloud-based solution)
In this blog, we’re focusing on our VoiceText TTS Server SDK. The key word here is “server”. By hosting your TTS engine on your own server, multiple end-users can access your TTS engine at anytime from anywhere. This NeoSpeech product is popular with IVR/CTI call center solutions, transportation solutions such as navigation apps, and dozens of other types of applications.
Let’s go over everything about our VoiceText TTS Server SDK, who it is for, how it works, and how we can optimize it to fit your needs.
What is it?
Included in this product package are our text-to-speech engine and a server SDK. Here’s what each of those are:
If you read the first blog in this series, then you already know what our TTS engine is. Our TTS engine is our core technology that is able to convert text into speech. The TTS engine contains a database of all the speech recordings from the original voice actor. When the engine receives a text input, it analyzes the text and then generates the speech output.
The Software Development Kit (SDK) is what will enable your custom application to communicate with the TTS server. An SDK is defined as a set of tools that allows you to integrate NeoSpeech’s text-to-speech technology into your own product. The SDK is what will allow your devices to send text-to-speech requests and receive synthesized audio from the TTS server.
Who is this solution for?
In the previous blog post in this series, we talked about how the TTS Engine SDK was for those wanting to build stand-alone applications. On the flip side, our TTS Server SDK is for those wanting to build server-based applications.
A stand-alone application works on its own. The TTS engine is located within the application and is able to make the text-to-speech conversions locally.
A server-based text-to-speech application means that the TTS engine is located within a server that can then connect to multiple end-users. The end-user devices/applications send TTS requests to the server (through a network connection) and then the server sends back the TTS conversion to the end-user.
Let’s look at a common example of a server-based text-to-speech application: navigation apps. These are apps on your smart phone, or within your car, which give you directions to your desired destination. These apps often feature text-to-speech features so users can listen to directions while keeping their eyes on the road.
If you’re making this type of application, you can host your TTS engine on your own server. Everyone who downloads your navigation app onto their smart phone is an end-user. Then, whenever an end-user is using your app and the application needs to give a direction such as, “Turn left onto First Street”, it will send a request to the server. The server will convert the text into a speech file, and then send that file back to the end-user application, which will then play the speech out loud.
If you were instead making stand-alone TTS navigation app, then the TTS engine will be located within the app and each end-user will have to download the TTS engine with the app onto their smart phone.
Call centers and IVR systems frequently host TTS engines within a server as well. Multiple phone lines can be connected to the server, which will send out the appropriate text-to-speech commands to each phone line as it receives a call.
There are a few reasons why you may prefer to host your TTS engine on your server as opposed to making a stand-alone product.
First off, TTS engines require a lot of storage space. If you want a high-quality, natural sounding text-to-speech voice reading out directions for your navigation app, then it might not be feasible to have the TTS engine located within the app since it’ll make the memory required to store the app on a smart phone too large for many end-users to handle.
Also, by hosting the TTS engine in your server, you have complete control over it. You can control all changes/updates to it, monitor its usage, and even edit the customizable dictionary that allows you to determine how certain words or phrases are pronounced.
If our VoiceText TTS Server SDK sounds like the solution you need then keep reading! If you’re still unsure about which one is right for you, you can get in contact with our sales team or read our free eBook on the matter!
How does NeoSpeech optimize it for my needs?
There are several ways we can customize the server SDK product package so it can fit your needs.
To start, our server engine can run on Windows and Linux operating systems, we’ll make sure to send you the correct version you need.
We also make sure to customize our engine to make sure the audio output is at your desired sampling rate. Typically, we do 8 kHz, 16kHz, and 44 kHz. Higher sampling rates will have higher voice qualities. 8 kHz is usually the best bet for IVR systems and emergency notifications.
In addition to the TTS engine that we send you, we’ll also send you the SDK that’ll enable you to establish communication between the TTS engine in your server and your end-users. Our SDKs support all major programming languages, including C, Java, COM, and .NET.
Our TTS engine can also be run on servers using the MRCP v1 and v2 protocols. MRCP is a commonly used communication protocol for speech servers. Our TTS server engine and SDK can interact with all kinds of MRCP applications.
Finally, we also set up your TTS engine to have the desired amount of ports you need. Think of a port as a gateway between your server and your end-users. If your TTS engine on your server has 8 ports, then it can handle up to 8 text-to-speech requests at once.
You’ll want to make sure that you have enough ports to support the amount of TTS requests you expect to receive at any given moment. If your TTS server does not have enough ports to handle all the requests it receives, then it will not be able to fulfill all the requests.
How does it work?
Now that you know everything about our VoiceText TTS Server SDK package, let’s go over everything you need to do to get your TTS server up and running.
Once you’ve decided that you want to purchase our TTS Server SDK, you just need to submit a Sales Inquiry form to our friendly sales team. One of our team members will promptly reach out to you and discuss all the details about what you need so we can customize the package to meet your requirements.
After you make your purchase, we’ll send you the download files for the TTS engine and the SDK package you need.
You’ll need to install the TTS engine and server SDK package onto your server. Here’s a look at the server directory that’ll be installed if you’re using a Windows server:
We know, that’s a lot of folders. The User Guide that we’ll send you will explain what each of them contains. For now, all you need to know are the “document” and “server-api” folders, which we’ll explain in the Integration section.
When we send you the download files, we also send you a license key that’ll allow you to verify the software to give you full access (without verifying, the TTS engine will just recite a demo message). Instructions on how to do this will be sent to you as well.
After you’ve installed and verified the TTS engine on your server, you need to make sure all of your end-users and/or units can communicate with it.
Look at the list of folders above, the “documents” folder will contain the instructions on how you’ll go about doing that.
Basically, you just need to import the files in the “server-api” folder into each end-user unit, and then make sure you have the IP address of the server and port number so it knows which server to communicate with. Then, you just need to write the code that’ll essentially be the rulebook on how and when your application will send TTS requests to the server.
That’s the basic overview of our VoiceText TTS Server SDK! We hope this blog helped you understand what this product is about and if it is for you. If it is, then click the button below to get in contact with us now!
Learn more about NeoSpeech’s Text-to-Speech
Want to learn more about all the ways Text-to-Speech can be used? Visit our Text-to-Speech Areas of Application page. And check out our Text-to-Speech Products page to find the right package for any device or application.
If you’re interested in integrating Text-to-Speech technology into your product, please fill out our short Sales Inquiry form and we’ll get you all the information and tools you need.