Full Circle Magazine FR

The other morning, while I was getting my shower, my mind went to a fairly dark place as it often does. If most people fall in the shower, or outside, or going down the stairs, they will end up with a bruise and be achy for a day or two. If I fall however, there is a very high probability that I will be paralysed or worse. And, as usual, when I get into that place, I wonder how I'll be able to continue my writing, and programming, and cooking, if that happened. I'm sure that I'm not the only one who thinks about that kind of thing. Luckily, today we have Siri, Amazon Alexa, Google Assistant, and more. Almost every smartphone has some kind of speech recognition. There are many pre-made packages out there for Linux and other operating systems. But I wanted to see what could be done via Python.

L'autre matin, alors que je prenais ma douche, mon esprit s'évada vers un endroit plutôt sombre, comme il le fait souvent. Si beaucoup de gens tombent dans la douche, ou dehors, ou dégringolent dans les escaliers, ils sont quitte pour des bleus ou des douleurs pendant un jour ou deux. Cependant, si je tombe, il y a un forte probabilité que je devienne paralysé ou pire. Et, comme d'habitude, quand j'y réfléchie je me demande comment je serai capable de continuer à écrire, et programmer, et cuire, si ça arrivait. Je suis sûr que je ne suis pas le seul qui pense à de telles choses.

Heureusement, aujourd'hui, nous avons Siri, Alexa, Google Assistant et d'autres. Pratiquement chaque smartphone a une espèce de reconnaissance vocale. Il y a de nombreux paquets préfabriqués disponibles pour Linux et d'autres systèmes d'exploitation. Mais, je veux voir ce qui peut être fait avec Python.

First, I want to hit the pause button and share a little history of speech recognition with you. Back when I was a child, when rainbows were in black and white, and we had to watch TV by candlelight, because there wasn’t any electricity (not really, but it confuses children to no end), computers were just getting started. In 1952, Bell labs created the Audrey system, which was able to understand a single speaker speaking numbers. Fast forward 10 years and IBM created a system called “Shoebox” which could understand and respond to a whopping 16 words. (see https://sonix.ai/history-of-speech-recognition) Enough of ancient history. Push the play button!

D'abord, je veux appuyer sur le bouton Pause et partager avec vous une petite histoire de reconnaissance vocale. Quand j'étais enfant, que les arcs-en-ciel étaient en noir et blanc, que je regardais la télé à la lueur des bougies, parce qu'il n'y avait pas d'électricité (pas vraiment, mais ça sème toujours le doute chez les enfants) les ordinateurs venaient juste d'apparaître. En 1952, Bell Labs créa le système Audrey, qui était capable de comprendre une personne énonçant des chiffres. Sautons 10 ans et IBM créa un système appelé « Shoebox » (boîte à chaussures) qui pouvait comprendre et répondre à l'énorme quantité de 16 mots. (Voir https://sonix.ai/history-of-speech-recognition)

Ça suffit pour l'histoire ancienne. Appuyons sur le bouton Lecture !

After a little web browsing, I found a library for Python called, surprisingly enough, SpeechRecognition. It can be installed via pip… pip install SpeechRecognition All the source code can be found at https://github.com/Uberi/speech_recognition#readme . I went ahead and installed via pip, and then went and downloaded the source from the github repository. I’ve “borrowed” the following snippet from the repository site… “… with support for several engines and APIs, online and offline. Speech recognition engine/API support: CMU Sphinx (works offline) Google Speech Recognition Google Cloud Speech API Wit.ai Microsoft Azure Speech Microsoft Bing Voice Recognition (Deprecated) Houndify API IBM Speech to Text Snowboy Hotword Detection (works offline)”

Suite à quelques recherches sur le Web, j'ai trouvé une bibliothèque pour Python appelée, quelle surprise, SpeechRecognition (reconnaissance vocale). Elle peut être installée via pip…

pip install SpeechRecognition

Tout le code source peut être trouvé sur https://github.com/Uberi/speech_recognition#readme.

J'ai continué de l'installer via pip, puis j'ai téléchargé la source depuis le dépôt github.

J'ai emprunté le bout de code suivant sur le site du dépôt…

“… with support for several engines and APIs, online and offline.

Speech recognition engine/API support:

CMU Sphinx (works offline) Google Speech Recognition Google Cloud Speech API Wit.ai Microsoft Azure Speech Microsoft Bing Voice Recognition (Deprecated) Houndify API IBM Speech to Text Snowboy Hotword Detection (works offline)”

Now, there are some things that need to be said here. Most of these online engines require you to register as a user and obtain keys to be able to access them, and/or may incur costs. The only offline services that are currently supported are CMU Sphinx (I'll talk about that in a little bit) and Snowboy. If you wish to see exactly what is required for each engine, download the source code from the GitHub repository and look at the init.py file located in the speech_recognition folder of the distribution. Once I saw the line “To quickly try it out, run python -m speech_recognition after installing.”, I couldn't resist. But I did, at least long enough to see what other requirements there might be, and I'm glad I did. A little bit further down says that if you want to use a microphone, which of course I do, that you need to install PyAudio. Ok. That makes sense. So I read a little further on. I saw this…

Maintenant, il y a des choses qui doivent être dites ici. La plupart des moteurs en ligne nécessitent que vous vous enregistriez comme utilisateur pour obtenir des clés permettant d'y accéder, avec de possibles coûts. Les seuls services hors ligne qui sont actuellement supportés sont CMU Sphinx (j'en parlerai dans un petit moment) et Snowboy. Si vous voulez voir exactement ce qui est nécessaire pour chaque moteur, téléchargez le code source depuis le dépôt GitHub et regardez dans le fichier init.py situé dans le dossier de reconnaissance vocale de la distribution.

Une fois que j'ai vu la ligne « pour un essai rapide, lancez python -m speech_recognition après l'installation. », j'avais du mal à résister. Mais je l'ai fait, assez longtemps pour voir quelles pouvaient être les autres exigences et je suis content de l’avoir fait. Un petit peu plus bas il est dit que, si vous voulez utiliser un microphone, ce que je voulais faire, bien sûr, vous devez utiliser PyAudio. C'est logique. Aussi, j'ai continué un peu à lire et j'ai vu ceci…

“On Debian-derived Linux distributions (like Ubuntu and Mint), install PyAudio using APT: execute sudo apt-get install python-pyaudio python3-pyaudio in a terminal.” I immediately copied the apt-get line from the bullet point and ran it in a terminal. I noticed that I could use pip to install the actual library. HOWEVER, because I was being stupid, I didn't notice the caveat below that. “If the version in the repositories is too old, install the latest release using Pip: execute sudo apt-get install portaudio19-dev python-all-dev python3-all-dev && sudo pip install pyaudio (replace pip with pip3 if using Python 3).”

« Sur les distributions Linux dérivées de Debian (comme Ubuntu et Mint), installez PyAudio en utilisant APT : exécutez sudo apt-get install python-pyaudio python3-pyaudio dans un terminal. »

J'ai immédiatement copié la ligne apt-get depuis l'alinéa et je l'ai lancé dans un terminal. Je notais que je pouvais utiliser pip pour installer la vraie bibliothèque. CEPENDANT, comme j'étais stupide, je ne notais pas la mise en garde indiquée plus bas.

« Si la version dans les dépôts est trop ancienne, installez la dernière publication en utilisant Pip : exécutez sudo apt-get install portaudio19-dev python-all-dev python3-all-dev && sudo pip install pyaudio (remplacez pip par pip3 si vous utilisez Python3). »

When I tried to use pip, I got a tonne of error messages. That's because I had not installed portaudio19-dev with the rest of it. I did another apt install, and ran the pip install command. It worked! So to get it all on your system, here’s what you want to do… $ sudo apt-get install portaudio19-dev python-all-dev python3-all-dev $ pip3 install pyaudio

Quand j'ai essayé d'utiliser pip, j'ai reçu une tonne de messages d'erreurs. C'était parce que je n'avais pas installé portaudio19-dev avec le reste.

J'ai fait un autre apt install et lancé la commande pip install . Ça marchait !

Aussi, pour tout avoir sur votre système, voici ce que vous avez à faire…

$ sudo apt-get install portaudio19-dev python-all-dev python3-all-dev

$ pip3 install pyaudio

Now, you can give the program a try… $ python -m speech_recognition ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map Cannot connect to server socket err = No such file or directory Cannot connect to server request channel jack server is not running or cannot be started JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock A moment of silence, please… Set minimum energy threshold to 1071.7441188823814 Say something! Got it! Now to recognize it… You said alright the time has come Say something! Got it! Now to recognize it… You said alright the time has come for all good men to come to the aid of the party Say something!

Maintenant, nous pouvons essayer le programme…

$ python -m speech_recognition

ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map Cannot connect to server socket err = No such file or directory Cannot connect to server request channel jack server is not running or cannot be started JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock A moment of silence, please… (un peu de silence, s'il vous plaît…) Set minimum energy threshold to 1071.7441188823814

Dites quelque chose ! Ça y est ! Maintenant pour le reconnaître… Vous avez dit Ok l'heure est venu Dites quelque chose ! Ça y est ! Maintenant pour le reconnaître… Vous avez dit Ok l'heure est venu pour tous les hommes de bonne volonté de venir à l'aide du parti Dites quelque chose !

Ok. Color me impressed. The warning messages didn’t worry me, they actually piqued my interest about all the possibilities. I wasn’t really happy with having to hit <Ctrl><C> to get the program to quit though. Now, I wanted to know more. Digging into the distribution folder, I found main.py, which I quickly modified to put my own unique spin on it… import speech_recognition as sr r = sr.Recognizer() m = sr.Microphone() First, we import the library and create instances of the Recognizer and the microphone objects. Next, we use a loop to continually check and adjust the microphone level…

Bon ! Je suis impressionné. Les messages d'erreur ne m'ont pas inquiété, ils ont plutôt piqué mon intérêt pour toutes les possibilités. Je n'étais pas vraiment content d'avoir à appuyer sur <Ctrl><C> pour quitter le programme. À ce moment-là, je voulais en savoir plus.

En fouillant dans le répertoire de la distribution, j'ai trouvé main.py, que je modifiais rapidement pour mettre mon unique grain de sel dedans…

import speech_recognition as sr

r = sr.Recognizer() m = sr.Microphone()

D'abord, nous importons la bibliothèque et créons les instances des objets Recognizer et du microphone. Ensuite, nous utilisons une boucle pour vérifier et ajuster en permanence le niveau du micro…

Notice that in this step, the Recognizer uses the Google Speech Recognition system. The line “if str is bytes:” checks to see if this is running under Python 2.x to properly print any unicode characters. Now we have printed out what the Recognizer THINKS was said. Next. we can check that with either the phrase “please quit” or “please stop” to programmatically end the program. I tried using a single word, but that never triggered. I’m guessing that the system just figured that the input was just noise. A side note here. What if you don’t want to use English as the language that you speak to the program with. What about Spanish or Norwegian or some other language. It’s covered! Change the line: value = r.recognize_google(audio) To value = r.recognize_google(audio, language=“en-GB”)

Notez qu'à cette étape, Recognizer utilise le système de reconnaissance vocale de Google. La ligne « if str is bytes: » fait un contrôle pour voir si ça tourne sous Python 2.x pour imprimer correctement tous les caractères Unicode. Puis, nous avons imprimé ce que Recognizer PENSE a été dit. Ensuite, nous pouvons vérifier s'il y a la phrase, soit « please quit » (merci d'arrêter), soit « please stop », pour une fin programmée du programme. J'ai essayé avec un mot seul, mais ça ne l'a jamais déclenché. Je suppose que le système a pris l'entrée pour du bruit. Une note en aparté. Que se passe-t-il si vous n'utilisez pas l'anglais comme langue pour parler avec le programme ? Et avec l'espagnol, le norvégien ou une autre langue ? Ce sujet est traité ! Modifez la ligne :

value = r.recognize_google(audio)

en

value = r.recognize_google(audio, language=“en-GB”)

For Brittish english, “no-NO” for Norwegian, or “es-AR” for Spanish (Argentina). You can check this link (https://stackoverflow.com/questions/14257598/what-are-language-codes-in-chromes-implementation-of-the-html5-speech-recogniti/14302134#14302134 ) to see many language settings. Moving on… Finally, we check for exceptions (bottom right) So the bottom line is that we get a string (value) back from the Recognizer. What we do with that information, right now, is open ended. A friend suggested that it might be good to use the Text to Speech API with espeak-ng that I talked about back in Full Circle Magazine #150. We might revisit in a future article. At this point, before I forget it, I promised a while ago to talk about PocketSphinx. There are a lot of people who find that it is not very reliable. I tried to get it to install and I have to admit, there were issues.

pour l'anglais britannique, « no-NO » pour le norvégien, ou « es-AR » pour l'espagnol (Argentine). Vous pouvez vérifer avec ce lien (https://stackoverflow.com/questions/14257598/what-are-language-codes-in-chromes-implementation-of-the-html5-speech-recogniti/14302134#14302134) pour voir les réglages de nombreuses langues. Continuons…

Enfin, nous vérifions les exceptions (en bas à droite).

Au bout du compte, nous récupérons une chaîne (valeur) venant de Recognizer. Ce que nous en faisons, actuellement, est à déterminer. Un ami m'a suggéré qu'il serait pas mal d'utiliser l'API Text to Speech avec espeak-ng dont j'ai parlé dans le n° 150 du magazine Full Circle. Nous reverrons ça dans un prochain article.

À ce stade, avant que j'oublie, j'ai promis, il y a un moment, de parler de PocketSphinx. Beaucoup de gens pensent qu'il n'est pas très fiable. J'ai essayé de le trouver et l'installer, et je dois admettre qu'il y a des problèmes.

First, I suggest that if you want to try PocketSphinx, you go to https://pypi.org/project/pocketsphinx/ and follow the instructions there. The GitHub repository is at https://github.com/bambocher/pocketsphinx-python . There is an example program that is provided with the source distribution that, at least for me, would not run. I kept getting an error starting with the line decoder = Decoder(config). I did a search and found a number of people having the same issue, but not much in the way of an answer. After digging for much longer than I should have, I found a reference to the MODELDIR config settings. After looking into my Python library folders, I found the site package for PocketSphinx. I realized that the MODELDIR and DATADIR statements were not being set properly in the example. They were: MODELDIR = “pocketsphinx/model” DATADIR = “pocketsphinx/test/data” but for me, they needed to be…

D'abord, je vous suggère, si vous voulez essayer PocketSphinx, d'aller sur https://pypi.org/project/pocketsphinx/ et de suivre les instructions. Le dépôt GitHub est à https://github.com/bambocher/pocketsphinx-python.

Un exemple de programme est fourni avec la distribution des sources qui, au moins pour moi, ne fonctionnait pas. Je reçois systématiquement une erreur commençant par la ligne decoder = Decoder(config). J'ai fait une recherche et j'ai trouvé beaucoup de gens avec le même problème, mais peu avec une piste de solution. Après avoir creusé beaucoup plus que je n'aurait voulu, j'ai trouvé une référence aux paramètres de configuration de MODELDIR. En regardant dans les dossiers de mes bibliothèques Python, j'ai trouvé le paquet pour PocketSphinx. J'ai réalisé que les déclarations MODELDIR et DATADIR n'étaient pas paramétrées correctement dans l'exemple. Elles étaient :

MODELDIR = “pocketsphinx/model”

DATADIR = “pocketsphinx/test/data”

mais, pour moi, elles auraient dû être…

MODELDIR = “/home/greg/.pyenv/versions/3.7.4/lib/python3.7/site-packages/pocketsphinx/model” DATADIR = “/home/greg/.pyenv/versions/3.7.4/lib/python3.7/site-packages/pocketsphinx/data” The package needs absolute path statements to where pip installed PocketSphinx. This can be a major problem if you are using something like pyenv and have multiple instances of Python or if you wish to distribute an app you wrote using PocketSphinx.

The next problem showed up on the line:

config.set_string('-hmm', path.join(MODELDIR, 'en-us/en-us'))

This was incorrect based on the installation. The “files are located in a folder directly off of the model folder. It should have been:

config.set_string('-hmm', path.join(MODELDIR, 'en-us'))

After these changes were made and saved, the example program worked. MODELDIR = ”/home/greg/.pyenv/versions/3.7.4/lib/python3.7/site-packages/pocketsphinx/model“ DATADIR = ”/home/greg/.pyenv/versions/3.7.4/lib/python3.7/site-packages/pocketsphinx/data“ Le paquet nécessite une déclaration avec des chemins absolus vers l'endroit où pip a installé PocketSphinx. Ça peut être un problème majeur si vous utilisez quelque chose comme pyenv et que vous avez des instances multiples de Python ou si vous souhaitez distribuer une appli que vous avez écrit en utilisant PocketSphinx. Le problème suivant se présente dans la ligne : config.set_string('-hmm', path.join(MODELDIR, 'en-us/en-us')) Celle-ci est incorrecte au regard de l'installation. Les “files sont situés dans un répertoire directement sous le dossier du modèle. Cela aurait dû être : config.set_string('-hmm', path.join(MODELDIR, 'en-us')) Une fois ces modifications faites et enregistrées, le progamme exemple fonctionnait. You might wonder at this point, “Ok, so how do we actually do something with the data we’ve received?” That is such an open-ended question, that it’s really out of the scope of this article. HOWEVER, I can point you in an interesting direction.

If you remember near the top of the article, one of the engines that is supported by the SpeechRecogintion library is Wit.ai . This is an interesting site. Basically, you provide speech or text to their API and it tries to match that input to something you have told the system that you expect the end user to enter. For example, let’s say that you want your end user to say things that would be along the lines of home automation like turning on or off a light, asking what the temperature is outside, changing the thermostat, and so on.

Check out https://wit.ai/. It takes a little bit of navigation, and you have to read a bunch of the site to understand, but I think you’ll get the gist of things pretty quickly. We’ll explore it some more next time. Peut-être, qu'en ce moment, vous vous demandez « Bon. Comment faisons-nous vraiment quelque chose des données que nous avons reçues ? » C'est en tant que tel, une question à réponse ouverte, qui est en fait hors de sujet pour cet article. CEPENDANT, je peux vous indiquer une direction intéressante. Si vous vous souvenez du haut de cet article, un des moteurs qui est supporté par la bibliothèque SpeechRecognition est Wit.ai. C'est un site intéressant. En gros, vous fournissez de la parole ou un texte à leur API et elle essaie de faire correspondre cette entrée à quelque chose que vous avez dit au système que vous vous attendez à ce que l'utilisateur entre. Par exemple, disons que vous voulez que votre utilisateur final dise des choses qui auraient trait à l'automatisation de la maison, comme allumer ou éteindre la lumière, demander la température extérieure, modifier le thermostat et ainsi de suite. Regardez sur https://wit.ai/. Il faut naviguer un peu et vous devez lire un paquet de choses sur le site pour comprendre, mais je pense que vous trouverez les grandes lignes rapidement. Nous l'explorerons davantage la prochaine fois. One other thing. While I was digging around on the web to get info to do this article, I found that Google Chrome now can support voice commands. I haven’t tried it yet, but it looks very interesting. From the website, it says “Use the magic of speech recognition to write emails and documents in Google Chrome. Dictation accurately transcribes your speech to text in real time. You can add paragraphs, punctuation marks, and even smileys using voice commands.” Check out the site at https://dictation.io/

The code is, as always, on Pastebin at: https://pastebin.com/pTJ6RcKL

Until next time, keep coding!**

Autre chose. Pendant que je cherchais sur le Web pour trouver des info pour écrire cet article, j'ai découvert que Google Chrome supporte maintenant les commandes vocales. Je ne l'ai pas encore essayé, mais ça semble intéressant. Sur le site Web, il est dit « Utilisez la magie de la reconnaissance vocale pour écrire des mails ou des documents dans Google Chrome. Dictation retranscrit précisément votre parole en texte en temps réel. Vous pouvez ajouter des paragraphes, des signes de ponctuation et même des smileys en utilisant les commandes vocales. » Regarder le site https://dictation.io/

Comme toujours, le code est sur Pastebin, à https://pastebin.com/pTJ6RcKL

Jusqu'à la prochaine fois, continuez à coder !

P 20, encart en haut à droite, 2 lignes noires :

Maintenant, nous créons une autre boucle pour obtenir quelque chose qui ressemble à des paroles dans un micro…

Il continue à écouter jusqu'à ce que a) il entende des paroles et b) les paroles s'arrêtent. Puis il essaie de traiter le fichier audio des paroles…

P 21, encart en bas à droite, 1 ligne noire :

Ça parait très facile, mais ça ne l'est pas. Voici un échantillon du programme en train de tourner (j'ai enlevé les messages d'avertissement de cet extrait)…