Nvidia is again accused of collecting AI data from Netflix and YouTube

According to a report that is receiving a lot of attention from 404 Media.

This conclusion was made after 404 Media collected information leaked from Nvidia's internal conversations on Slack, emails and internal documents. Nvidia has helped itself to "a visual experience equivalent to a lifetime of training data every day," said Ming-Yu Liu, vice president of research at Nvidia and project lead for the Cosmos project. admitted in an email in May. This Cosmos project aims to build a large foundational language model for Nvidia, similar to Google's Gemini 1.5, OpenAI GPT-4 or Llama 3.1 projects. Meta.

Anonymous former Nvidia employees told 404 Media that they were asked to scrape video content from Netflix, YouTube and other major online sources, turning them into training data for use with AI products. company differences.

To accomplish this, the Cosmos project is said to have used an open source video downloader and used machine learning to handle IP, thus avoiding YouTube's blocking efforts. According to leaked information, project managers have discussed using up to 30 virtual machines running on Amazon Web Services to download the equivalent of about 80 years of video, along with countless individual clips each time. day. When these employees questioned the legality of the Cosmos project, the company's leadership assured them that they had received permission from their partners to use the content.

images 1 of Nvidia is again accused of collecting AI data from Netflix and YouTube

For its part, Nvidia claims there was no wrongdoing. " We respect the rights of all content creators and believe that our models and research efforts fully comply with the letter and spirit of copyright law, " said a statement. An Nvidia employee told 404 Media via email. " Copyright law protects particular expressions but not facts, ideas, data or information. Anyone has the right to freely learn about facts, ideas, data or information from another source and use them to create their own expressions. Fair use also protects the ability to use a work for transformative purposes, such as training an AI model .

This is not the first time Nvidia (not to mention most of the rest of the companies in the AI field) has adopted a "pay it forward" approach in collecting its AI training data. In July, Nvidia was also called out in another report for illegally collecting copyrighted videos as AI training data.

At CES 2024, Nvidia caused controversy with vague answers about how it trains its new generative AI engine for games. In response, the company reaffirmed that its tools are "commercially safe." But what is the truth? Let's wait and see!