Microsoft AI creates a real voice with only 200 training samples
Modern text-to-speech algorithms possess incredible abilities.
Modern text-to-speech algorithms possess incredible abilities. The most obvious evidence is the two open source tools called SpecAugment and Translatotron, recently released by Google. In particular, Translatotron can completely translate the words of one person into another language while preserving the tone and intonation of the sentence. However, creativity in technology is infinite and there is always room for more extraordinary things.
- Amazon AI system helps to cut 15% of speech recognition error on Alexa
Recent artificial intelligence researchers at Microsoft have presented an article titled 'Almost Unsupervised Text to Speech and Automatic Speech Recognition', which details an AI system that promotes uncontrolled learning. unsupervised learning - a branch of machine learning technology, which enables the AI to learn knowledge from test data that is not labeled, classified, nor formatted. This AI system of Microsoft has resonated loudly when achieving accuracy in reading comprehension, word processing up to 99.84%, and at the same time automatic voice simulation capability has also improved to 11.7. % More impressively, this advanced machine learning model only needs to use up to 200 sound clips and corresponding phonetic sessions as input training data.
The key to developing this AI model is Transformers, a kind of neural architecture built by a team of scientists at Google Brain, Google's AI research department, and has been introduced in one In-depth technology articles in 2017. Like all other deep neural networks, Transformers contain neurons (mathematical functions are modeled 'loosely' according to biological neuron neuron) is arranged in layers that can link together to transmit 'signal' from the input data and slowly adjust the synaptic strength - weight - of each connection (that's how models extract features and learn how to make predictions). However, Transformers also possesses a 'unique' feature, that every output element will be connected to all inputs and the weight between them is calculated extremely flexible.
- MIT strives to develop an AI model that can drive almost like a human
From this fact, Microsoft researchers have incorporated a Transformer component into their AI system design, making it possible to acquire speech or text as input or output data. And the researchers decided to take the publicly available LJSpeech data source - which contains 13,100 English recordings and the corresponding record (transcript) - as training data for the AI system. Next, the team randomly selected 200 of the 13,100 excerpts to create a training data set, and they also took advantage of an coding component that automatically suppresses noise to reconstruct. broken text and text.
The results are not bad at all. Considering each small excerpt, the researchers found that it produced better results than the basic algorithms used in experiments. And some of the results samples sound pretty much like created by humans.
- Japan's artificial intelligence has created ultra-realistic virtual fashion models
In the future, the goal of researchers is to push back all the limitations of unsupervised learning technology by taking full advantage of the amount of textual data and unconnected words, with the help of Pre-digging methods are available. 'For this task, we proposed an almost unattended method to turn text into speech and automatic voice recognition, in which only a few textual and verbal data were used. connection and a small amount of additional data are not paired. As demonstrated in the experiments, our design components will be essential to developing the ability to convert speech and text with some of the data paired 'team representatives for know.
- Google released a huge AI training data warehouse with over 5 million photos of 200,000 locations worldwide
Details of this project will be presented by Microsoft at the International Conference on Machine Learning which takes place in Long Beach, California from 10 to 15 June, and the research team intends to released open source in the next few weeks.
You should read it
- Instructions for new learners AI: networks of neural networks
- What is machine learning? What is deep learning? Difference between AI, machine learning and deep learning
- The difference between AI, machine learning and deep learning
- Deep Learning - new cybersecurity tool?
- Entertainment on Neural Networks, Artificial Intelligence and Machine Learning
- Google researchers for gaming AI to improve enhanced learning ability
- Google released the TensorFlow machine learning framework specifically for graphical data
- AI engineer Facebook talks about deep learning, new programming languages and hardware for artificial intelligence
- 6 steps to start learning artificial intelligence programming (AI)
- [Infographic] AI and Machine Learning in the enterprise
- This robot only takes 2 hours to learn to walk by itself
- AI uses WiFi data to estimate the number of people in a room
Maybe you are interested
This startup's AI creates custom software in just a few minutes, at a low cost Latest La Liga rankings What do you need to know when buying Bitcoin or selling Bitcoin? 20 proofs that this world has only 3 types of people, which one are you? Create your own bowling playground in your yard 24 signs that you are introverted, not shy