YJL: Trying out the CMU Sphinx

Not this Sphinx, naming is really a important fact. Anyway, I tried to play with CMU Sphinx. It, the pocketsphinx, provides Python binding, though no real documentation. There are two modes that you can do recognition, by on-the-fly or by block of data. By default models, on-the-fly gives useless results, I dont know if it can do better after training a bit. However, I have no idea how to do that, too. By decoding a block a data gives acceptable results.

I was actually caught by gnome-voice-control. It does work, but it also crashes. I checked out the repository (I couldnt compile version 0.3) and installed sphinxbase 0.4.1 and pocketsphinx 0.5.1.

Since it crashes everytime, I wanted to write a simple or similar one using Python. Unfortunately, the result isnt good, though reduce word bank and do word slice on our own plus decode by block may help to improve the accuracy, but I think thats much effort to do and I have much knowledge of speech recognition. I stopped here.

I still organized a simple code, which uses pyalsaaudio 0.4 to capture audio. It records till you press Ctrl+C, then do recognition.

You can also try this Python script¹, which is a GUI and uses Sphinxes Gstreamer plugin.

1 Related posts