Saturday, January 23, 2010

Using SAPI 4/5 voices with text to speech in Asterisk

For a while now i have been working on TTS (text to speech) on Asterisk for a startup i am running with two friends.
One of my biggest problems was that the available voices for Greek in linux are very limited,sounding like robots with a laryngitis problem in some cases.
And from what i understood (or actually hear) with the exception of English most of the other languages had similar problems.
Windows has a large number of commercial "voices" that use the SAPI interface like the AT&T natural voices, Nuance, and Loqundo, but i could not find any solution that would allow me to use them with Asterisk.
Plus i wanted something that could easily scale to hundreds of channels and be low costs or free as in beer.

So i used pyTTS and web2py to create, in essence, a web service that given a string returns a file with the spoken text.
This service runs on a Windows machine and provides a very simple API to allow selection of the language,voice and fine tuning that might be required.
Once the service is invoked it returns a wav file with the spoken text.

A simple python script on the asterisk side provides a library, called ast-SAPI, that handles the communication with the web service and returns a wav file.
it pretty much works the same way as flite when its told to create wav files.

At the moment the ast-sapi gets called by a Python AGI script.
The Python AGI receives the text and settings from the Asterisk dialplan, calls the library and then streams the resulting wav.
Also got an asterisk app build using flite's asterisk app as a template,to do the same from the dialplan and avoid the AGI. It works but there are some issues that need to be addressed.

The benefits of this solution are many.

I get good quality voices to use with asterisk and since this is similar to a web service is easy to scale and spread the load to many machines.

Its still in a prototype stage, needs more work and i already spotted some places where things need to be changed, but overall seems to work.

No comments: