Meta unveils Voicebox, an AI model for audio processing for creators

After countless rumors, Meta has finally officially announced its latest generation AI model called Voicebox. This model is designed to help content creators efficiently handle speech generation-related tasks such as audio editing, sampling, and stylization, even if it hasn't been specifically trained through in-context data.

Meta confidently asserts that this new AI model will 'benefit many people around the world', not just in the field of content creation. For example, helping visually impaired people hear text messages by voice, as well as allowing people to speak foreign languages in their own voice.

Voicebox itself is also touted as being able to both create high-quality audio clips and edit pre-recorded audio clips to remove unwanted interruptions, such as car horns, while preserving the same multilingual content and style, (generating speech in six different languages). Future developments that have been planned by Meta for the model include providing natural voices for visual assistants or characters in games in the metaverse.

Meta also compared Voicebox to other audio-enabled AI models currently on the market, namely key competitors like Vall-E and YourTTS. Overall, Meta's model is advanced and outperforms the competition when it comes to Word error rates and Style similarities.

Meta unveils Voicebox, an AI model for audio processing for creators Picture 1

Voicebox is built on Flow Matching. This is Meta's latest non-regressive generalization model, which can handle highly indeterminate mapping between text and speech. This allows Voicebox to learn from different types of speech data without having to carefully label them, giving them access to a wider variety of training data and at scale. To date, Voicebox has trained over 50,000 hours of speech recordings and transcripts from audiobooks in English, French, Spanish, German, Polish and Portuguese.

While this technology could usher in a new era of AI in audio processing, Meta acknowledges that it can bring the potential for abuse and unintended harm. In the research paper that Meta shared about Voicebox will include details on how the company built a highly effective classifier that can distinguish between authentic voices and voices generated by Voicebox.

Meta will not make the AI Voicebox program available to the public for use, nor will the source code be released, at least for the time being.

Meta artificial intelligence

Isabella Humphrey

Update 24 July 2023

You should read it

May be interested

What is Generative AI?
chatgpt, bing ai, and google bard are some of the most recognizable names in the world of consumer artificial intelligence. all 3 products have one thing in common - they are all generative ai products.
9 ways ChatGPT helps content creators
chatgpt is proving to have many uses for many different industries. it can help answer questions, solve problems, and come up with ideas that someone would never have thought of.
Why are hackers targeting ChatGPT accounts?
hackers are using huge resources to steal chatgpt accounts, even free ones.
5 reasons why companies ban ChatGPT
despite chatgpt's impressive capabilities, several large companies have banned their employees from using this ai chatbot.
5 things not to be shared with AI chatbots
there are inherent risks associated with using ai chatbots, such as privacy concerns and potential cyberattacks. it is important to exercise caution when interacting with chatbots.
Android or iOS better? Bard answers a question that causes fever for Google engineers
bard has revealed that he doesn't like google's own android and thinks ios is the better operating system.

Meta unveils Voicebox, an AI model for audio processing for creators

You should read it

May be interested

Artificial intelligence