Apple, Nvidia, and many large companies are caught up in AI training scandals using controversial YouTube sources
Many major tech companies, including Apple, Nvidia, Salesforce and Anthropophic, are embroiled in a new controversy related to AI training data. According to a report published by ProofNews, the dataset used by these companies to train their in-house AI models includes subtitles from YouTube videos.
The dataset is titled "YouTube Subtitles", created by EleutherAI and published in 2020. Inside the dataset are subtitles from 173,536 YouTube videos downloaded from over 48,000 different channels.
The problem was that the dataset appeared to go against YouTube's terms and conditions, which prohibit accessing videos by "automated means." According to ProofNews, YouTube Subtitles is a training dataset of 5.7 GB (489 million words) and includes subtitles from more than 12,000 videos that have been removed from the platform. Notably, in this dataset there are video subtitles of many famous content creators on YouTube, with a large number of subscribers:
Proof News found material from popular YouTube creators, including MrBeast (289 million subscribers, 2 videos), Marques Brownlee (19 million subscribers, 7 videos), Jacksepticeye (nearly 31 million subscribers, 377 videos) and PewDiePie (111 million subscribers, 337 videos). Among them, there are many documents used to train AI that contain inappropriate content, even conspiracy theories.
In fact, this 'YouTube Subtitles' dataset belongs to a group called "The Pile", which includes several other training datasets. Most of Pile's datasets are open to anyone with enough space and computing power to access.
The companies named did not respond to press requests for comment on the findings and allegations about the use of license training data. ProofNews searched through online posts and white papers to find evidence and determine whose creative materials were used to train which specific AI models. However, it is quite possible to create a complete list of companies using this dataset, since AI companies do not typically disclose the data they use to train their models.
Marques Brownlee, one of the creators whose content was used illegally, said he paid to use the captioning feature on YouTube. Therefore, it is a 'blatant violation' for companies to use this type of data without permission or payment.
Note that Apple and other tech companies don't download subtitles themselves, but rather train their AI models using them. However, this action is an example of the unintended consequences of AI. Some creators say they are uncertain about the possibility that AI could be used to mimic their content in the future.
You should read it
- Visit the 8 most majestic stores in the world of Apple
- Fix Apple ID error disabled
- 12 interesting products for Apple
- Will Apple's slander 'i' disappear?
- Create an Apple ID, register an Apple ID account for less than 3 minutes
- How to change Apple ID password?
- Apple is about to encroach into the creative AI segment with the 'super project' Apple GPT
- How to Set Up Apple TV
May be interested
- Former General Electric Jack Welch CEO: the quality of leadership is both innate and moderate, through trainingfor many people, the question: whether leadership is innate or is it through the new training process is really a controversial issue. who is the one who deserves to become a leader?
- The US is slowing down the export of AI chips to the Middle East by Nvidia and AMDu.s. officials have slowed licensing to chipmakers like nvidia corp., according to people familiar with the matter. and advanced micro devices inc. to ship large-scale ai accelerators to the middle east
- YouTube 'Training'are you a loyal youtube audience? refer to some of the tips below to master this free service.
- Change the YouTube video frame rate in a snapas you know, youtube is the world's largest online video sharing site. in addition to the advantages such as rich content, variety, high quality videos, large storage capacity, youtube has many other good features that you may not know. today, we will share with you how to change the youtube video frame rate to enhance the viewing experience.
- Collection of free music sources to make YouTube videoscurrently, copyright music issues are quite harsh and sore on youtube. especially for those who make videos and upload them to youtube, for example, if they make videos without music, it will be very boring, but if they add music, they will have copyright.
- Instructions for 'plowing' increase Youtube views for idols right on your computerdo you see recent music videos of son tung gaining great views. that's because son tung has a large fan base. they are the ones who helped son tung have 1 million views of videos.
- Nvidia and Foxconn prepare to use 'humanoid robot workers' to produce graphics cardsfoxconn and nvidia are in talks to deploy a large number of humanoid robots at foxconn's new factory in houston, usa, according to an exclusive report from reuters.
- Here are 7 Google YouTube apps and their effectsroom when you get lost in the youtube world because there are so many different versions with different functions.
- Disable the NVIDIA component to speed up the computernvidia graphics card is one of the devices supporting the best graphic design field work today. however, not everyone knows it is also a factor slowing down the computer by installing more components together with package driver installation.
- YouTube changed the way it views views on music videos, the first record battle for views in the first 24 hoursafter some controversy over the first 24-hour view record, youtube decided to change the way it views views on music videos to end the controversial behavior in the music industry.