Full Circle Magazine FR

Ceci est une ancienne révision du document !

The other morning, while I was getting my shower, my mind went to a fairly dark place as it often does. If most people fall in the shower, or outside, or going down the stairs, they will end up with a bruise and be achy for a day or two. If I fall however, there is a very high probability that I will be paralysed or worse. And, as usual, when I get into that place, I wonder how I'll be able to continue my writing, and programming, and cooking, if that happened. I'm sure that I'm not the only one who thinks about that kind of thing. Luckily, today we have Siri, Amazon Alexa, Google Assistant, and more. Almost every smartphone has some kind of speech recognition. There are many pre-made packages out there for Linux and other operating systems. But I wanted to see what could be done via Python.

First, I want to hit the pause button and share a little history of speech recognition with you. Back when I was a child, when rainbows were in black and white, and we had to watch TV by candlelight, because there wasn’t any electricity (not really, but it confuses children to no end), computers were just getting started. In 1952, Bell labs created the Audrey system, which was able to understand a single speaker speaking numbers. Fast forward 10 years and IBM created a system called “Shoebox” which could understand and respond to a whopping 16 words. (see https://sonix.ai/history-of-speech-recognition) Enough of ancient history. Push the play button!

After a little web browsing, I found a library for Python called, surprisingly enough, SpeechRecognition. It can be installed via pip… pip install SpeechRecognition All the source code can be found at https://github.com/Uberi/speech_recognition#readme . I went ahead and installed via pip, and then went and downloaded the source from the github repository. I’ve “borrowed” the following snippet from the repository site… “… with support for several engines and APIs, online and offline.

Speech recognition engine/API support: CMU Sphinx (works offline) Google Speech Recognition Google Cloud Speech API Wit.ai Microsoft Azure Speech Microsoft Bing Voice Recognition (Deprecated) Houndify API IBM Speech to Text Snowboy Hotword Detection (works offline)” Now, there are somethings that need to be said here. Most of these online engines require you to register as a user and obtain keys to be able to access them, and/or may incur costs. The only offline services that are currently supported are CMU Sphinx (I'll talk about that in a little bit) and Snowboy. If you wish to see exactly what is required for each engine, download the source code from the GitHub repository and look at the init.py file located in the speech_recognition folder of the distribution.

Once I saw the line “To quickly try it out, run python -m speech_recognition after installing.”, I couldn't resist. But I did, at least long enough to see what other requirements there might be, and I'm glad I did. A little bit further down says that if you want to use a microphone, which of course I do, that you need to install PyAudio. Ok. That makes sense. So I read a little further on. I saw this… “On Debian-derived Linux distributions (like Ubuntu and Mint), install PyAudio using APT: execute sudo apt-get install python-pyaudio python3-pyaudio in a terminal.” I immediately copied the apt-get line from the bullet point and ran it in a terminal. I noticed that I could use pip to install the actual library. HOWEVER, because I was being stupid, I didn't notice the caveat below that.

“If the version in the repositories is too old, install the latest release using Pip: execute sudo apt-get install portaudio19-dev python-all-dev python3-all-dev && sudo pip install pyaudio (replace pip with pip3 if using Python 3).” When I tried to use pip, I got a tonne of error messages. That's because I had not installed portaudio19-dev with the rest of it. I did another apt install, and ran the pip install command. It worked! So to get it all on your system, here’s what you want to do… $ sudo apt-get install portaudio19-dev python-all-dev python3-all-dev $ pip3 install pyaudio

Now, you can give the program a try… $ python -m speech_recognition ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map Cannot connect to server socket err = No such file or directory Cannot connect to server request channel jack server is not running or cannot be started JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock A moment of silence, please… Set minimum energy threshold to 1071.7441188823814 Say something! Got it! Now to recognize it… You said alright the time has come Say something! Got it! Now to recognize it… You said alright the time has come for all good men to come to the aid of the party Say something!

Ok. Color me impressed. The warning messages didn’t worry me, they actually piqued my interest about all the possibilities. I wasn’t really happy with having to hit <Ctrl><C> to get the program to quit though. Now, I wanted to know more. Digging into the distribution folder, I found main.py, which I quickly modified to put my own unique spin on it… import speech_recognition as sr r = sr.Recognizer() m = sr.Microphone() First, we import the library and create instances of the Recognizer and the microphone objects. Next, we use a loop to continually check and adjust the microphone level…

Notice that in this step, the Recognizer uses the Google Speech Recognition system. The line “if str is bytes:” checks to see if this is running under Python 2.x to properly print any unicode characters. Now we have printed out what the Recognizer THINKS was said. Next. we can check that with either the phrase “please quit” or “please stop” to programmatically end the program. I tried using a single word, but that never triggered. I’m guessing that the system just figured that the input was just noise. A side note here. What if you don’t want to use English as the language that you speak to the program with. What about Spanish or Norwegian or some other language. It’s covered! Change the line: value = r.recognize_google(audio) To value = r.recognize_google(audio, language=“en-GB”)

For Brittish english, “no-NO” for Norwegian, or “es-AR” for Spanish (Argentina). You can check this link (https://stackoverflow.com/questions/14257598/what-are-language-codes-in-chromes-implementation-of-the-html5-speech-recogniti/14302134#14302134 ) to see many language settings. Moving on… Finally, we check for exceptions (bottom right) So the bottom line is that we get a string (value) back from the Recognizer. What we do with that information, right now, is open ended. A friend suggested that it might be good to use the Text to Speech API with espeak-ng that I talked about back in Full Circle Magazine #150. We might revisit in a future article. At this point, before I forget it, I promised a while ago to talk about PocketSphinx. There are a lot of people who find that it is not very reliable. I tried to get it to install and I have to admit, there were issues.

First, I suggest that if you want to try PocketSphinx, you go to https://pypi.org/project/pocketsphinx/ and follow the instructions there. The GitHub repository is at https://github.com/bambocher/pocketsphinx-python . There is an example program that is provided with the source distribution that, at least for me, would not run. I kept getting an error starting with the line decoder = Decoder(config). I did a search and found a number of people having the same issue, but not much in the way of an answer. After digging for much longer than I should have, I found a reference to the MODELDIR config settings. After looking into my Python library folders, I found the site package for PocketSphinx. I realized that the MODELDIR and DATADIR statements were not being set properly in the example. They were: MODELDIR = “pocketsphinx/model” DATADIR = “pocketsphinx/test/data” but for me, they needed to be…

MODELDIR = “/home/greg/.pyenv/versions/3.7.4/lib/python3.7/site-packages/pocketsphinx/model” DATADIR = “/home/greg/.pyenv/versions/3.7.4/lib/python3.7/site-packages/pocketsphinx/data” The package needs absolute path statements to where pip installed PocketSphinx. This can be a major problem if you are using something like pyenv and have multiple instances of Python or if you wish to distribute an app you wrote using PocketSphinx. The next problem showed up on the line: config.set_string('-hmm', path.join(MODELDIR, 'en-us/en-us')) This was incorrect based on the installation. The “files are located in a folder directly off of the model folder. It should have been: config.set_string('-hmm', path.join(MODELDIR, 'en-us')) After these changes were made and saved, the example program worked.

You might wonder at this point, “Ok, so how do we actually do something with the data we’ve received?” That is such an open-ended question, that it’s really out of the scope of this article. HOWEVER, I can point you in an interesting direction. If you remember near the top of the article, one of the engines that is supported by the SpeechRecogintion library is Wit.ai . This is an interesting site. Basically, you provide speech or text to their API and it tries to match that input to something you have told the system that you expect the end user to enter. For example, let’s say that you want your end user to say things that would be along the lines of home automation like turning on or off a light, asking what the temperature is outside, changing the thermostat, and so on. Check out https://wit.ai/. It takes a little bit of navigation, and you have to read a bunch of the site to understand, but I think you’ll get the gist of things pretty quickly. We’ll explore it some more next time.

One other thing. While I was digging around on the web to get info to do this article, I found that Google Chrome now can support voice commands. I haven’t tried it yet, but it looks very interesting. From the website, it says “Use the magic of speech recognition to write emails and documents in Google Chrome. Dictation accurately transcribes your speech to text in real time. You can add paragraphs, punctuation marks, and even smileys using voice commands.” Check out the site at https://dictation.io/ The code is, as always, on Pastebin at: https://pastebin.com/pTJ6RcKL Until next time, keep coding!