Amazon AI system helps to cut 15% of speech recognition error on Alexa

A few months ago, Amazon had detailed information on some of the issues that prevented virtual assistant Alexa from giving exact feedback when users awakened this virtual assistant on some TV models, in internet advertising. or on the radio.

A few months ago, Amazon had elaborated on some of the issues that made Alexa unable to give an accurate response when users awakened this virtual assistant on some TV models, in internet advertising or on radio. . After all, the main problem here is how Amazon's voice assistant can effectively filter out background noise from the environment to give more accurate feedback to users. Recently, in an accompanying blog post and research document called End-to-End Anchored Speech Recognition, Amazon engineers presented a new noise isolation technique based on human intelligence. create, can help improve voice recognition as well as Alexa commands by 15%. More detailed information on how this system works is expected to be presented at the International Conference on Audio, Voice and Signal Processing held in Brighton later this year.

Amazon AI system helps to cut 15% of speech recognition error on Alexa Picture 1Amazon AI system helps to cut 15% of speech recognition error on Alexa Picture 1

  1. Disney's AI model creates animated sequences from scripts

'In fact, we are always trying to improve the performance of Alexa by teaching this virtual assistant how to' ignore 'the commands that are not for me, in other words know how to select the command in a multitude of noises from the surroundings. To do so, we assume that the speaker activates an Alexa-enabled device by saying some specific phrases to awaken this virtual assistant (wake word) - usually 'Alexa' - and this is the key phrase that the virtual assistant must isolate and identify in a mess of sound from the outside environment. Basically, our technique will support 'capture' quickly the sound can be wake word (usually based on intonation or phoneme similarity) and conduct comparison with wake word. standard to accurately identify the sentence. After that, the sentence that has the most matching elements with the standard wake word will be understood as a command, while the other words will be considered background noise ', Xin Fan, team leader of the team in charge of Alexa AI explained.

Amazon AI system helps to cut 15% of speech recognition error on Alexa Picture 2Amazon AI system helps to cut 15% of speech recognition error on Alexa Picture 2

  1. OpenAI artificial intelligence defeated the current world champion Dota 2

Instead of training a separate AI system to distinguish between background noise and wake word, Xin Fan and his colleagues conducted a fusion of their wording with an intellectual-based speech recognition model. artificial standard. Scientists have tested two variants on the AI ​​architecture of sequential decoders - that is, using an input data processing architecture (samples that capture audio signals a thousandth of a second long). to produce a corresponding output sequence (phonemic rendering of sound) - and, as with most conventional decoding-decoding techniques, the coding component summarizes the input data as a vector. has a fixed length (a series of numbers) and converts it to output data. Meanwhile, a special attention mechanism, which is 'trained' to detect some basic wake words in one or more sequences of words from the surrounding environment, will have the 'instruction' set. decoding pays more attention to those characteristics in vectors.

Amazon AI system helps to cut 15% of speech recognition error on Alexa Picture 3Amazon AI system helps to cut 15% of speech recognition error on Alexa Picture 3

  1. EU: AI systems should be developed and implemented in a transparent and responsible manner

In an experiment, the researchers trained one of their AI models to further emphasize the ability to recognize the word's pronunciation, first by adding a sound-direct comparison component of wake words with many different ways to pronounce and then use this data as an input for another trained component to compress bits of the coding vector. The results are interesting, this model has helped reduce the error rate to 13% compared to 15%.

5 ★ | 1 Vote