I have the following code to recognize voice in Spanish, the problem is that there are several words which it does not recognize correctly.
using System;
using System.Speech.Recognition;
namespace SpeechRecognitionApp
{
class Program
{
static void Main(string[] args)
{
using (
SpeechRecognitionEngine recognizer =
new SpeechRecognitionEngine(
new System.Globalization.CultureInfo("es-ES")))
{
recognizer.LoadGrammar(new DictationGrammar());
recognizer.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
recognizer.SetInputToDefaultAudioDevice();
recognizer.RecognizeAsync(RecognizeMode.Multiple);
while (true)
{
Console.ReadLine();
}
}
}
static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine("Recognized text: " + e.Result.Text);
}
}
}
I have also tried to define the words that it can recognize, in this way it is never wrong, the drawback is that the number of words that it recognizes is limited to the number of words that I define.
Choices colors = new Choices();
colors.Add(new string[] { "azul", "rojo", "verde" });
// Create a GrammarBuilder object and append the Choices object.
GrammarBuilder gb = new GrammarBuilder();
gb.Append(colors);
// Create the Grammar instance and load it into the speech recognition engine.
Grammar g = new Grammar(gb);
// Create and load a dictation grammar.
recognizer.LoadGrammar(g);
Is there a way to recognize multiple words without getting it wrong and without being limited by the defined words?
Sorry if this doesn't qualify as an answer but it's too long to be a comment.
I have had to deal with the same problem in the past and for a long time, which I can report as follows:
There is nothing you can programmatically do to improve the performance of the speech recognition engine , the options normally look limited to:
- 1 Train him to better recognize your voice.
- 2 Have a microphone that reduces the amount of noise it receives in order to get a better reading.
In any case, the performance will never improve above 80%, it is the same for the different speech recognition systems such as sphinx .They work very well for a limited list of commands but not for transcribing an entire speech .
An alternative is to use some online speech recognition that takes advantage of neural networks, as a user I can say that they are much more efficient but I have never really developed an application with them:
- https://azure.microsoft.com/en-us /services/cognitive-services/speech/
- https://cloud.google.com/speech-to-text/
The downside is clear, that they require internet access and that both google and microsoft are paid.
There is also the mozilla deepSpeech project that uses neural networks and does not need to be connected to the internet.