Full Circle Magazine FR

Ceci est une ancienne révision du document !

HELLO WORLD! I hate using that phrase when introducing someone to a new programming language or concept; so much in fact, I refuse to use it. I change it to something like “Hello from Python” or something equally close but equally different.

You might notice above that this is article # 98 in my Python programming series. If everything goes according to plan, my 100th article will in December's Full Circle Magazine.

Now let's start with this month's article… the reason you are here…

Text to Speech. Something that has been around for many years, but when it comes to Linux, is fairly limited, especially when it comes to free software. Add a requirement of Python usage to that and the list gets shorter, so let's explore what's out there. Another requirement is that it needs to be something that is somewhat regularly maintained, and it needs to have some documentation that a normal person can really understand.

Remember, as we are going through this, the old saying “You get what you pay for” and in this instance it's true to some extent.

The best that I could find that fits all those requirements is a package called eSpeak (https://sourceforge.net/projects/espeak/). While it appears that there hasn't been any forward progress since the end of 2017, there is a fork of this project that is currently being worked on called 'eSpeak NG' (https://github.com/espeak-ng/espeak-ng). The eSpeak projects have support for over 100 languages and accents. This having been said, the voice quality is very robotic, to say the least. Nothing like what you get with Google Assistant, Alexa, Cortana or Siri. However, with the proper manipulations, it can sound understandable, at least in English. I always say, I know only two languages, English and BAD English, so I'm at the mercy of those who can speak other languages to determine the actual usability.

How to use it…

Luckily, to install eSpeak-ng on Ubuntu is pretty easy.

:~$ sudo apt-get install espeak-ng-espeak

To test it, while you are in the terminal, try this…

:~$ espeak-ng “Welcome to free and open source Text to Speech processing.”

Now you can hear what I'm talking about. It's pretty much robotic and something reminds you of listening to the voice of Stephen Hawking. If you listen carefully, it can be mostly understood.

There are many command-line arguments that you can use to change things around and to provide other options. A quick documentation page is at https://github.com/espeak-ng/espeak-ng/blob/master/src/espeak-ng.1.ronn. I'll try to distill them down, like a fine scotch whiskey, for you. Let's take a quick look at some of them.

If you want to see the various languages that are available, just type:

:~$ espeak-ng –voices

You will receive the output shown on the next page (top right).

I've cut that list down considerably to save space here in the article. And to be brutally honest, I wouldn't begin to know if some of these were even close to reality or not.

To use a specific voice, such as Spanish, you can use:

:~$ espeak-ng -ves “Buenos días. ¿Cómo estás?”

We can also change the speed of the vocal output by using the -s <integer> option:

:~$ espeak-ng -ves -s 125 “Buenos días. ¿Cómo estás?” :~$ espeak-ng -ves -s 90 “Buenos días. ¿Cómo estás?”

Another thing that we can do is to change the pitch using the -p <integer> option:

:~$ espeak-ng -ves -s 125 -p 75 “Buenos días. ¿Cómo estás?” :~$ espeak-ng -ves -s 125 -p 35 “Buenos días. ¿Cómo estás?”

That's fine for the command-line, but what we really want to do is create the speech from a Python program. No problem.

We need a library to interface with eSpeak-ng. Luckily, there is a pretty nice version that can be installed via pip. It's called py-espeak-ng. It works on both Python 2.x and 3.x . The homepage is https://github.com/gooofy/py-espeak-ng.

pip install py-espeak-ng

or

pip3 install py-espeak-ng

Once py-espeak-ng is installed, fire up your favorite version of Python. The documentation shows a slightly different sequence of commands, but they don't work on my system. This sequence does… The first thing we have to do is import the library…

from espeakng import ESpeakNG

Next, we need to instantiate the engine:

esng = ESpeakNG()

Next, we need to assign a voice…

esng.voice = 'en'

Now we can finally have the engine speak to us…

esng.say('Hello from Python. Welcome to text to speech from Python.')

Now, let's change the voice, this time to French:

esng.voice = 'fr'
esng.say('Bonjour. Comment vas-tu?')

Now, let's change the pitch as we did before. The syntax is a bit different, but still simple:

esng.pitch = 32
esng.say('Bonjour. Comment vas-tu?')

What if we want to find our the current speed or pitch? Just this simple…

p = esng.pitch
print(p)

32

sp = esng.speed
print(sp)

175

Even finding out the current voice is simple:

print(esng.voice)

fr

To get the list of voices:

print(esng.voices)

(output is below)

Many more options are available, and you can pretty much use everything shown above to figure out how to carry on from here.

Now there is one other Text to Speech option that we have available to us. The reason I haven't mentioned it until now, is that it isn't quite free. It's the Google Translate TTS API. You need to have Python 3.4 to start, so if you are still hanging on to Python 2.x, you are out of luck for this one. You also need to add a few files. For Ubuntu and other Debian distributions, in a terminal type:

:~$ sudo apt-get install sox libsox-fmt-mp3

Next, install the google_speech library using pip:

:~$ pip3 install google_speech

Once we have that done, let's try it on the command-line.

:~$ google_speech -l en “Hello $USER, it is $(date)”

For some reason I get 'sox WARN alsa: can't encode 0-bit Unknown or not applicable', but that's ok.

There is a small amount of documentation available at https://github.com/desbma/GoogleSpeech that you can also try.

You can even try the code shown above.

Now, let's look at google_speech in Python.

from google_speech import Speech
text = 'Hello user from the google speech api'
lang = “en”
speech = Speech(text, lang)
speech.play()

And now for something completely different…

lang = 'nb'
text = 'God morgen. Hvordan har du det?' #Good morning. How are you? in Norwegian
speech = Speech(text, lang)
speech.play()

You can certainly see that the speech is much better and more understandable. Why not stick with this? One of the requirements I stated earlier was that it needed to be free. That not only applies to the software that we use, but the engine service and the lack of internet. If these last two don't bother you, then this is for you. You do, however need to be aware of the cost of using the Google API for this. According to https://cloud.google.com/text-to-speech/pricing for the “Standard (non-WaveNet voices) service, there is a monthly free tier that (the way I read it) is from 0 to 4 million characters. Anything over that amount per month would be charged at $4.00 USD per million characters. If you look at their example example near the top of the page…

<speak>

<say-as interpret-as="cardinal">12345</say-as> and one more

</speak>

would count as 79 characters. So be careful when you attempt to calculate your usage. There is also the possibility that if you send too much data too quickly, the system might block you for a while if you don't have an account.

Well, that’s about it for this month. Until next time, keep coding!