Microsoft AI creates a real voice with only 200 training samples

Modern text-to-speech algorithms possess incredible abilities.

Modern text-to-speech algorithms possess incredible abilities. The most obvious evidence is the two open source tools called SpecAugment and Translatotron, recently released by Google. In particular, Translatotron can completely translate the words of one person into another language while preserving the tone and intonation of the sentence. However, creativity in technology is infinite and there is always room for more extraordinary things.

Picture 1 of Microsoft AI creates a real voice with only 200 training samples

  1. Amazon AI system helps to cut 15% of speech recognition error on Alexa

Recent artificial intelligence researchers at Microsoft have presented an article titled 'Almost Unsupervised Text to Speech and Automatic Speech Recognition', which details an AI system that promotes uncontrolled learning. unsupervised learning - a branch of machine learning technology, which enables the AI ​​to learn knowledge from test data that is not labeled, classified, nor formatted. This AI system of Microsoft has resonated loudly when achieving accuracy in reading comprehension, word processing up to 99.84%, and at the same time automatic voice simulation capability has also improved to 11.7. % More impressively, this advanced machine learning model only needs to use up to 200 sound clips and corresponding phonetic sessions as input training data.

The key to developing this AI model is Transformers, a kind of neural architecture built by a team of scientists at Google Brain, Google's AI research department, and has been introduced in one In-depth technology articles in 2017. Like all other deep neural networks, Transformers contain neurons (mathematical functions are modeled 'loosely' according to biological neuron neuron) is arranged in layers that can link together to transmit 'signal' from the input data and slowly adjust the synaptic strength - weight - of each connection (that's how models extract features and learn how to make predictions). However, Transformers also possesses a 'unique' feature, that every output element will be connected to all inputs and the weight between them is calculated extremely flexible.

Picture 2 of Microsoft AI creates a real voice with only 200 training samples

  1. MIT strives to develop an AI model that can drive almost like a human

From this fact, Microsoft researchers have incorporated a Transformer component into their AI system design, making it possible to acquire speech or text as input or output data. And the researchers decided to take the publicly available LJSpeech data source - which contains 13,100 English recordings and the corresponding record (transcript) - as training data for the AI ​​system. Next, the team randomly selected 200 of the 13,100 excerpts to create a training data set, and they also took advantage of an coding component that automatically suppresses noise to reconstruct. broken text and text.

The results are not bad at all. Considering each small excerpt, the researchers found that it produced better results than the basic algorithms used in experiments. And some of the results samples sound pretty much like created by humans.

Picture 3 of Microsoft AI creates a real voice with only 200 training samples

  1. Japan's artificial intelligence has created ultra-realistic virtual fashion models

In the future, the goal of researchers is to push back all the limitations of unsupervised learning technology by taking full advantage of the amount of textual data and unconnected words, with the help of Pre-digging methods are available. 'For this task, we proposed an almost unattended method to turn text into speech and automatic voice recognition, in which only a few textual and verbal data were used. connection and a small amount of additional data are not paired. As demonstrated in the experiments, our design components will be essential to developing the ability to convert speech and text with some of the data paired 'team representatives for know.

  1. Google released a huge AI training data warehouse with over 5 million photos of 200,000 locations worldwide

Details of this project will be presented by Microsoft at the International Conference on Machine Learning which takes place in Long Beach, California from 10 to 15 June, and the research team intends to released open source in the next few weeks.

Update 26 May 2019
Category

System

Mac OS X

Hardware

Game

Tech info

Technology

Science

Life

Application

Electric

Program

Mobile