🏳️‍⚧️ trans rights are human rights 🏳️‍⚧️
Theme

Snapchat Sued for Scraping 70M YouTube Videos to Train Commercial AI

Exclusive Investigation

Snap Stole 70 Million Videos to Build Its AI Empire

The Non-Financial Ledger: What Was Actually Taken

Think about what it takes to make a YouTube video. You come up with an idea. You write it, or at least outline it in your head. You film it, sometimes for hours, to get a few minutes of usable footage. You edit. You re-edit. You add music, titles, graphics. You publish it. You respond to comments. You build an audience over months and years, one subscriber at a time, surviving algorithm changes and demonetization waves and the constant, grinding pressure to produce more, faster, better.

That is what a content creator’s library represents. It is not a file. It is a career. It is the accumulated evidence of thousands of hours of human labor, of decisions made under financial pressure, of creative risk taken in public with no safety net.

Ted Entertainment, Inc., the company behind the h3h3 Productions and H3 Podcast Highlights channels, has built over 5,800 original videos and accumulated more than 4 billion views. Four billion. That number represents an almost incomprehensible amount of human attention directed at content that Ethan and Hila Klein built from the ground up. Their company did not produce those videos for Snap’s AI team. They produced them for their audience.

Matt Fisher, who runs @Mr.ShortGame, spent years building one of the most trusted instructional golf channels on the platform. Instructional content is particularly labor-intensive: you have to be accurate, you have to be engaging, and you have to maintain credibility with an audience that will notice and call out mistakes. That trust is his business. It is not raw material for a tech company’s training corpus.

Golfholics built a channel around a passion for the game, invested real money in production, and cultivated a community of over 130,000 subscribers. These are not vanity metrics. These are people who came back, video after video, because they trusted the creator. That trust was built by the creator. Snap contributed nothing to it and then helped itself to the product.

What makes this particular theft uniquely brutal is the permanence. The complaint makes this explicit: once AI ingests content, that content is stored in the model’s neural network and is not capable of deletion or retraction. There is no DMCA takedown notice that fixes this. There is no settlement that can extract Ethan Klein’s face, voice, and editorial judgment from inside Snap’s model weights. The damage is structural and irreversible. Every future video Snap’s AI generates may carry the residue of work that was never offered to Snap, never licensed, and never compensated.

The creators chose YouTube specifically because YouTube promised protection. YouTube’s anti-circumvention tools and Terms of Service were, according to the complaint, a driving factor behind their decision to upload there. They made a calculated bet that the platform would hold the line. Snap’s operation proves that for the largest, best-resourced players in tech, the line is optional.

Timeline: How Snap Built Its AI Dataset on Stolen Foundations 2021 Microsoft Research Asia publishes HD-VILA-100M. 100M clips from 3.09M YouTube videos. Licensed: non-commercial academic use only. ~2–3 yrs Undisclosed Date Snap obtains HD-VILA-100M from GitHub. Begins scraping YouTube at scale using yt-dlp + rotating IP virtual machines. ~months Feb 29, 2024 Snap publishes “Panda-70M” research paper: 3.8M YouTube videos, 70.7M clips, AI-generated captions. Scraped without creator consent. ~23 mo Jan 23, 2026 Federal class action filed. Case No. 2:26-cv-00754, C.D. California.

Legal Receipts: What The Complaint Actually Says

These are direct quotes from the filed complaint. No paraphrase. No spin. Just what is in the court record.

“Rather than negotiate for lawful licenses, Defendant broke through YouTube’s access protections to obtain the massive dataset necessary to fuel Defendant’s generative AI efforts and, by extension, Defendant’s success in the field of AI text-to-video and image-to-video models.”

Complaint, Para. 51 — Filed 01/23/2026, Case 2:26-cv-00754
  • This establishes that licensing was a known option Snap chose to bypass. The complaint is clear that legitimate channels existed; Snap simply found them too slow or too expensive.
  • The phrase “by extension, Defendant’s success” directly ties the scraping operation to Snap’s competitive positioning in the AI market, undermining any future “pure research” defense.

“Upon information and belief, Defendant used tools and processes such as the open-source YouTube video downloader ‘yt-dlp’ combined with virtual machines that refresh IP addresses to access audiovisual content from YouTube’s platform. Such tools and processes are necessary for Defendant to avoid being blocked by YouTube.”

Complaint, Para. 77 — Filed 01/23/2026, Case 2:26-cv-00754
  • The rotating IP address technique is not a neutral technical choice. It exists specifically to defeat YouTube’s detection systems, which constitutes intentional circumvention under the DMCA’s anti-circumvention provision.
  • The complaint’s phrasing “necessary for Defendant to avoid being blocked” confirms that Snap knew YouTube would stop the operation if it could detect it, and Snap engineered around that detection deliberately.

“From a creator’s perspective, when a creator uploads their hard work to our platform, they have certain expectations. One of those expectations is that the terms of service is going to be abided by. It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service. Those are the rules of the road in terms of content on our platform.”

Complaint, Para. 86 — Quoting YouTube CEO Neal Mohan
  • YouTube’s own CEO publicly named what Snap did as a “clear violation.” This is not a creator’s interpretation or a plaintiff’s attorney framing; it is the platform operator’s direct characterization of the conduct.
  • The phrase “rules of the road” signals that this is established, known policy. Snap cannot claim ignorance of a standard the platform’s CEO felt compelled to address publicly.

“Defendant obtained datasets from a variety of sources, including academic repositories, research compilations, and other large-scale video collections created by universities, corporations, and independent researchers. These datasets were treated by Defendant as raw material for commercial generative AI training purposes, even when the datasets were expressly licensed for academic or non-commercial use and prohibited commercial exploitation, redistribution, or any use that would involve downloading the underlying copyrighted works.”

Complaint, Para. 54 — Filed 01/23/2026, Case 2:26-cv-00754
  • This paragraph shows the scope of the alleged misconduct extends beyond HD-VILA-100M. Snap is accused of systematically treating “academic use only” datasets as free commercial raw material across multiple sources.
  • The explicit reference to licenses that “prohibited commercial exploitation” means any good-faith defense is severely weakened. Snap allegedly read the license, understood the restriction, and proceeded anyway.

“Once AI ingests content, that content is stored in its neural network, and not capable of deletion or retraction. Defendant’s actions constitute abuse and exploitation of content creators’ work for Defendant’s profit.”

Complaint, Para. 9 — Filed 01/23/2026, Case 2:26-cv-00754
  • This is the complaint’s most damning practical claim: the harm is permanent and irreversible. No injunction can undo training that has already occurred on already-ingested data.
  • This argument also strengthens the case for maximum statutory damages, because the creators cannot be made whole in any conventional sense. The only leverage left is financial punishment severe enough to deter future conduct by Snap and the entire industry.

“Plaintiffs and the Class Members will never be able to claw back the intellectual property unlawfully copied and used by Defendant to train its generative AI.”

Relationship Map: How the Scraping Pipeline Connected Snap to Creators’ Work CONTENT CREATORS h3h3, Mr.ShortGame, Golfholics + thousands more upload & authorize streaming YOUTUBE TPMs + ToS enforcing streaming-only access HD-VILA-100M Academic-only index. 3.09M YT video pointers Snap acquires from GitHub SNAP SCRAPING OPERATION yt-dlp + rotating IP virtual machines Bypasses YouTube TPMs. No consent obtained. Millions of individual DMCA violations alleged circumvents TPMs PANDA-70M Snap’s own dataset. Released 2024. 3.8M videos / 70.7M clips. AI-generated captions. No creator consent. SNAP COMMERCIAL AI PRODUCTS Imagine Lens · Easy Lens · Spectacles AR Text-to-video · Image-to-video generation Commercially sold. Creators paid: $0 trains AI model

Societal Impact Mapping: Who Gets Hurt and How

Public Health of the Creator Economy

The creator economy is one of the few remaining routes to independent income for people who lack access to traditional employment pathways. Snap’s alleged conduct attacks the structural foundations that make that economy function.

  • YouTube’s Technological Protection Measures and Terms of Service are not just platform rules; they are the social contract that makes independent content creation economically viable. Creators upload to YouTube partly because they trust those protections will hold. If those protections can be bypassed by any sufficiently resourced company, the entire premise of that contract collapses.
  • The complaint states that the class consists of thousands of YouTube creators. That means potentially thousands of independent workers had their labor converted into corporate AI training material with no compensation, no notice, and no recourse for the underlying model contamination.
  • The instructional and educational value embedded in channels like @Mr.ShortGame is now inside Snap’s model. That model can now produce golf instruction content that competes directly with the human expert whose work trained it, without paying that expert a licensing fee or even acknowledging the source.
  • The complaint notes that most YouTube videos are not registered with the U.S. Copyright Office. This is normal and legal; registration is not required for copyright protection to exist. Snap’s operation specifically preyed on this gap, knowing that unregistered works are harder and more expensive for individual creators to litigate. A class action is the only mechanism that makes fighting back economically rational for most creators.
  • The precedent this sets, if unchallenged, is that any AI company with sufficient infrastructure can harvest any public-facing platform’s content by simply building tools sophisticated enough to evade detection. The harm is industry-wide and systemic, extending far beyond Snap or YouTube.

Economic Inequality

The wealth transfer implicit in Snap’s alleged operation is stark: a multi-billion-dollar corporation extracted the labor of thousands of independent workers at zero cost to build products it sells commercially.

  • Snap Inc. is a publicly traded Delaware corporation headquartered in Santa Monica with a valuation in the billions. The class members it allegedly harvested include independent creators and small corporate entities like Ted Entertainment, Inc. and Golfholics, Inc., operating with production budgets that are incomparably smaller. The power asymmetry could not be more extreme.
  • The complaint identifies that Snap’s “financial and technological success would not have been possible without the video content created by Plaintiffs and Class Members.” This is a direct admission embedded in the legal framing: the creators built real value that Snap captured without payment.
  • Snap’s commercial AI products, including Snapchat’s Imagine Lens and the Easy Lens tool, are now features that attract and retain paying users on Snap’s platform. Every dollar those features generate is downstream of the training data. The creators who provided that data see none of it.
  • The complaint notes Snap intends to commercialize its AI capabilities through Spectacles, wearable AR glasses targeting the consumer market in 2026. This is a hardware product. If that product ships and succeeds, the creators whose videos trained its underlying model will have contributed to a physical retail product they will never be compensated for.
  • The class action structure itself reflects economic inequality. The complaint acknowledges that individual creators would find the “cost of litigating their individual claims prohibitively high.” The only reason this fight is possible is because attorneys are willing to take it on a class basis. Without that mechanism, Snap’s bet would have been close to risk-free.
  • Academic datasets like HD-VILA-100M were built by universities and researchers using public funding and academic labor, then licensed under non-commercial terms to protect the public interest. By using these datasets commercially, Snap also captured publicly subsidized research value without contributing to the academic commons.
Compliance vs. Reality: What Was Required vs. What Snap Allegedly Did REQUIRED BY LAW & CONTRACT WHAT SNAP ALLEGEDLY DID STEP 1: Identify Data Needed Review dataset license terms before any use. Confirm: academic-only license prohibits commercial use. STEP 1 (ACTUAL) Obtains HD-VILA-100M from GitHub. Proceeds despite “non-commercial academic use only” license. STEP 2: Obtain Creator Consent Negotiate licenses with rights holders or YouTube. Receive written authorization before any download. STEP 2 — SKIPPED ENTIRELY No consent sought from Plaintiffs or any class member. No contact with YouTube. No license negotiation. STEP 3: Use Permitted Access Channels Access content only through YouTube’s licensed API. Streaming only. No raw file access permitted. STEP 3 — ACTIVELY CIRCUMVENTED Deployed yt-dlp to bypass streaming controls. Rotated IP addresses to evade YouTube’s blocking systems. STEP 4: Commercial Use Requires Separate License AI training for commercial product requires explicit licensing. Compensation paid to creators per agreement. STEP 4 — SKIPPED No commercial license obtained for AI training. $0 paid to any creator. Content used to build commercial products. OUTCOME: Creators Compensated. IP Protected. Rights holders retain control over how work is used. AI companies build products on legitimate foundations. OUTCOME: Creators Get Nothing. Snap Profits. Stolen data permanently embedded in AI model. Cannot be extracted. Cannot be undone. Class action filed.

The Cost of a Life: What the Numbers Actually Mean

Scale of Alleged Scraping: Plaintiff Channels in Snap’s AI Datasets 0 50 100 150 200 285 Number of Videos in Dataset HD-VILA-100M Panda-70M 146 155 h3h3 Productions 285 283 H3 Podcast Highlights 2 8 MrShortGame (@Mr.ShortGame) 62 62 Golfholics Inc.

What Now? How to Fight Back

The class action is active in federal court. Every YouTube content creator in the United States whose videos appear in the HD-VILA-100M or Panda-70M datasets is a potential class member, and the complaint is explicit that those datasets contain a complete map of video URLs and identifiers that can match each video to its creator.

Named Leadership and Defendant

  • Snap Inc., defendant, 3000 31st Street, Santa Monica, CA 90405. A Delaware corporation publicly traded on the NYSE. Its executives, officers, and directors are excluded from the class.
  • The company behind Snapchat, Snap Spectacles, the Imagine Lens, and the Easy Lens prompt-to-image feature. Every AI product Snap has launched or intends to launch draws on the training pipeline at issue in this case.

Watchlist: Regulatory and Legal Bodies

  • U.S. Copyright Office: The DMCA anti-circumvention provisions at the center of this case (17 U.S.C. § 1201) fall under Copyright Office jurisdiction. The ongoing rulemaking on AI and training data is directly relevant; public comments are open.
  • Federal Trade Commission (FTC): The FTC has authority over deceptive practices and unfair methods of competition. Using datasets licensed only for academic research to build commercial products without disclosure raises consumer protection and competition questions.
  • U.S. Department of Justice (DOJ): The DOJ’s Antitrust Division has been examining Big Tech market conduct. Systematic circumvention of platform protections to acquire training data at no cost could constitute unfair competitive advantage.
  • Central District of California Federal Court: Case No. 2:26-cv-00754 is active. Court filings are public record. The docket can be monitored through PACER at pacer.gov.
  • Congress: Senate and House Judiciary Committees: Both committees have ongoing AI oversight hearings. Creator economy organizations are testifying. Constituent contact with representatives on AI training data legislation is directly relevant to this case’s outcome.

Calls to Action: What You Can Do

  • If you are a YouTube creator whose videos may appear in the HD-VILA-100M or Panda-70M datasets, contact plaintiffs’ counsel at ELLZEY KHERKHER SANFORD MONTGOMERY LLP (Houston, TX) or HEAH BAR-NISSIM LLP (Los Angeles, CA). The complaint establishes that class membership is ascertainable through the dataset’s own video URL indexes.
  • Check whether your videos are in the datasets. The Panda-70M dataset was released publicly on arxiv.org (arxiv.org/pdf/2402.19479v1) as a research paper. The HD-VILA-100M index was published on GitHub. Researchers and journalists have begun building lookup tools; creator communities on platforms like Reddit (r/NewTubers, r/youtube) have been discussing access.
  • Support creator-led unions and organizations that are lobbying for mandatory licensing requirements on AI training data. The Creator Guild Alliance and similar organizations are pushing for legislation that would require tech companies to compensate rights holders before training on their work. Your membership, dues, and public support matter.
  • Push for strong AI training data legislation at the state and federal level. Contact your U.S. Senators and House representative using the EFF’s (Electronic Frontier Foundation) Action Center and demand co-sponsorship of any bill requiring opt-in consent for AI training on copyrighted creative work.
  • Mutual aid for affected creators: If you are a creator who suspects your content was used without consent and you cannot afford legal fees, creator solidarity networks like the Creator Rights Alliance and legal aid clinics at universities with intellectual property law programs may be able to provide guidance or pro bono support.
  • Amplify this case. The mainstream tech press has consistently framed AI training data scraping as a legal gray area. It is not gray when a company knowingly violates a platform’s Terms of Service, uses tools specifically built to evade detection, and does so while knowing the dataset license is non-commercial. Share this story with every creator you know.

The source document for this investigation is attached below.

Explore by category

01

Antitrust

Monopolies and anti-competition tactics used to crush rivals.

View Cases →
02

Product Safety Violations

When companies sell dangerous goods, consumers pay the price.

View Cases →
03

Environmental Violations

Pollution, ecological collapse, and unchecked greed.

View Cases →
04

Labor Exploitation

Wage theft, worker abuse, and unsafe conditions.

View Cases →
05

Data Breaches & Privacy

Misuse and mishandling of personal information.

View Cases →
06

Financial Fraud & Corruption

Lies, scams, and executive impunity that distort markets.

View Cases →
07

Intellectual Property

IP theft that punishes originality and rewards copying.

View Cases →
08

Misleading Marketing

False claims that waste money and bury critical safety info.

View Cases →
Aleeia
Aleeia

I'm Aleeia, the creator of this website.

I have 6+ years of experience as an independent researcher covering corporate misconduct, sourced from legal documents, regulatory filings, and professional legal databases.

My background includes a Supply Chain Management degree from Michigan State University's Eli Broad College of Business, and years working inside the industries I now cover.

Every post on this site was either written or personally reviewed and edited by me before publication.

Learn more about my research standards and editorial process by visiting my About page

Articles: 1804