Meta unveils Voicebox, an AI model for audio processing for creators
After countless rumors, Meta has finally officially announced its latest generation AI model called Voicebox.
Your browser doesn’t support HTML5 audio
After countless rumors, Meta has finally officially announced its latest generation AI model called Voicebox. This model is designed to help content creators efficiently handle speech generation-related tasks such as audio editing, sampling, and stylization, even if it hasn't been specifically trained through in-context data.
Meta confidently asserts that this new AI model will 'benefit many people around the world', not just in the field of content creation. For example, helping visually impaired people hear text messages by voice, as well as allowing people to speak foreign languages in their own voice.
Voicebox itself is also touted as being able to both create high-quality audio clips and edit pre-recorded audio clips to remove unwanted interruptions, such as car horns, while preserving the same multilingual content and style, (generating speech in six different languages). Future developments that have been planned by Meta for the model include providing natural voices for visual assistants or characters in games in the metaverse.
Meta also compared Voicebox to other audio-enabled AI models currently on the market, namely key competitors like Vall-E and YourTTS. Overall, Meta's model is advanced and outperforms the competition when it comes to Word error rates and Style similarities.
Voicebox is built on Flow Matching. This is Meta's latest non-regressive generalization model, which can handle highly indeterminate mapping between text and speech. This allows Voicebox to learn from different types of speech data without having to carefully label them, giving them access to a wider variety of training data and at scale. To date, Voicebox has trained over 50,000 hours of speech recordings and transcripts from audiobooks in English, French, Spanish, German, Polish and Portuguese.
While this technology could usher in a new era of AI in audio processing, Meta acknowledges that it can bring the potential for abuse and unintended harm. In the research paper that Meta shared about Voicebox will include details on how the company built a highly effective classifier that can distinguish between authentic voices and voices generated by Voicebox.
Meta will not make the AI Voicebox program available to the public for use, nor will the source code be released, at least for the time being.
You should read it
- Meta starts releasing LLaMA 'super AI' language model to researchers
- Qualcomm partners with Meta to bring Llama 2 to smartphones and PCs
- Watching pictures painted by artificial intelligence, everyone thinks that is the work of a true artist
- Artificial intelligence learns to create another artificial intelligence, replacing people in the future
- 6 steps to start learning artificial intelligence programming (AI)
- What happens if aliens are artificial intelligence?
- [Infographic] Benefits and hazards from Artificial Intelligence
- How can the AI see us behind the walls?
- Review important milestones in the history of more than 60 years of artificial intelligence development
- Top 5 programming languages to develop AI
- Microsoft plans to bring AI into Raspberry Pi
- The dawn of Artificial Intelligence has arrived, mankind is watching!
Maybe you are interested
Complete tutorial of Excel 2016 (Part 3): How to create and open existing spreadsheets Beautiful Earth video at night is filmed from the International Space Station ISS The 1-0-2 radiator combined with air and ice heat makes PC people surprised by creativity Test your understanding of Excel Top 10 biggest budget development games in history: Common names Things to note in the Sanhok PUBG Mobile map