Comments on YJL: Trying out the CMU Sphinx

We are trying to use pocket sphinx to place text i...

2009-07-21T16:22:43.724-07:00

We are trying to use pocket sphinx to place text into a website text box on a mobile phone when a person speaks a word. We're getting nowhere. Can pocket sphinx do this and if so how?

Thank u very much i am workin in that ,,,will get ...

2009-05-31T08:59:15.893-07:00

Thank u very much i am workin in that ,,,will get back to u soon :)

Have you check out Sphinx' sample code? Do wha...

2009-05-28T10:15:03.046-07:00

Have you check out Sphinx' sample code? Do what it does.

Hi I am Harsha,I am working on Pocket sphinx, I ha...

2009-05-28T06:54:37.921-07:00

Hi I am Harsha,

I am working on Pocket sphinx, I have instaled an4,sphinxtrain,sphinxbase,pocket sphinx on my linux machine.
but i donno wat to do next, how to get the speech recognition going, do i need to wrte drivers for mic(audio) pls help me

Thanks in advance:)

The easiest thing you can do is to use the -cmnini...

2009-02-26T07:18:00.000-08:00

The easiest thing you can do is to use the -cmninit configuration parameter. This accepts a vector of numbers which is used to normalize the acoustic features. In batch mode this is estimated based on the entire utterance, which gives a pretty good estimate. But when recognizing on the fly it's based on past samples of audio, which means that it can start out quite inaccurate, and then converges to a good estimate over time.

In C, what you can do (and I think gnome-voice-control does this) is to save the estimated normalization vector from previous sessions and use it to initialize the decoder. This isn't available in the Python interface since it's buried in a few layers of API, unfortunately.

However if you look at the logging information that PocketSphinx prints out you will see something like:

CMN: 48.92 4.3 -0.4 ...

You can use these numbers (comma-separated) as the -cmninit argument. It will only be good for your specific microphone and sound card though.

@dhd Is that mean if I let decoder keep running an...

2009-02-25T09:25:00.000-08:00

@dhd Is that mean if I let decoder keep running and don't end_utt, that will make it better?

How does it collect data and what does it collect? While collecting should we remain silence and only allow background noise through microphone?

If I end_utt and start_utt again, will the previous collected data in last session apply on new session?

Could you describe what conditions or procedures that I can make on-the-fly recognition more accurate?

The problem people usually encounter with on-the-f...

2009-02-25T08:09:00.000-08:00

The problem people usually encounter with on-the-fly recognition is that the first few things you speak to it are frequently poorly recognized. This is because it needs to collect some data in order to normalize the audio input for your particular microphone.

This is actually a bit different from training or speaker adaptation, something we hope to make easier soon...