The Consent Crisis: How AI Systems Turned Us All Into Unpaid, Unwitting Data Sources

You have probably never emailed an AI company to say: “Yes, please, feel free to ingest my photos, my Reddit comments, my blog posts, and my kids’ pictures to train your models.”

Yet, there is a real chance that at least some of that data has already been swept into the enormous datasets powering systems like ChatGPT, Claude, and Gemini.

Over the last few years, generative AI has gone from curiosity to everyday utility. You use chatbots to summarize documents, AI art tools to mock up designs, and video generators to storyboard ideas. But in the background, a different story has been unfolding: publishers, artists, platforms, and regulators are accusing AI companies of building multi‑billion‑dollar systems on data that was never meaningfully consented to.

This is the consent crisis in AI: a gap between what is legally allowed (or still legally unclear) and what feels ethically acceptable when it comes to your data, your work, and even your identity.

How We Got Here: “Public” Data Meets Industrial-Scale Scraping

Modern AI models eat data for breakfast. Large language models and image generators are trained on billions of examples: books, news articles, code, social media posts, and images scraped from across the web.

Investigations and lawsuits have documented that:

Systems like ChatGPT, DALL·E, Google’s Bard (now Gemini), and Stable Diffusion were trained on huge troves of books, articles, and images scraped from the internet, much of which is copyrighted and was never individually licensed or consented to. The Washington Post
Image datasets such as LAION-5B, used to train models like Stable Diffusion, have included photos from children’s entire childhoods sourced from parenting blogs and social media, without the children (or often the parents) having any idea their images were being repurposed for AI training. Ars Technica

The industry default has basically been: if it is visible on the public web, it is fair game for machine learning. That might sound similar to how search engines index the web, but there are key differences:

Search engines link back to the original sources.
Generative AI models internalize patterns and can produce outputs that substitute for the original work.
People largely never expected that posting publicly meant “being blended into a commercial AI model forever.”

Legally, companies often argue this is covered by doctrines like “fair use” (in the US) or similar concepts elsewhere. Ethically, many creators and users feel like something is badly off.

Because there is almost no explicit consent mechanism for training data today, the main way creators are pushing back is through the courts.

A few high-profile examples:

In January 2023, three artists – Sarah Andersen, Kelly McKernan, and Karla Ortiz – filed a class-action lawsuit against Stability AI, Midjourney, and DeviantArt, alleging that these companies trained AI tools on about five billion images scraped from the web “without the consent of the original artists.” Wikipedia: AI art
Getty Images sued Stability AI in the UK, claiming the company copied some 12 million photos with captions and metadata from Getty’s sites to train Stable Diffusion without permission or payment. Associated Press
Encyclopaedia Britannica and Merriam‑Webster recently sued OpenAI, alleging their copyrighted reference content was used to train models like ChatGPT without authorization or licensing, with Britannica alone claiming nearly 100,000 of its articles were ingested. Tom’s Guide

These cases are less about a single image or article and more about the principle: do AI companies have to ask before using someone’s work at scale? Or can they treat the web as one giant free buffet?

Courts are still split. In late 2025, for example, Stability AI largely won a major UK case brought by Getty, with the court finding no copyright infringement in the way training was done – a decision that prompted renewed calls for stronger protections for creators. Associated Press

In the meantime, more than 90 lawsuits by authors, artists, musicians, and news outlets have been filed against AI companies, according to a 2026 analysis in The Atlantic, many of them centered on training data used without consent. The Atlantic

Platforms vs. Model Makers: When “Public” Has Terms Attached

It is not just artists and publishers. Platforms that host user content are also pushing back on unconsented scraping.

Reddit, for instance, has sued Anthropic (maker of Claude) for allegedly scraping Reddit content to train AI despite being told not to, and for “intentionally” training on the personal data of Reddit users without requesting their consent. CBS News

Google, which both builds AI models and runs the world’s dominant search engine, has also started suing companies that scrape its search results for AI training, signaling that even tech giants do not want their data harvested without agreements. Computerworld

This dynamic matters for you because:

Your posts, comments, and photos often live on platforms like Reddit, Instagram, or blogs.
AI companies may scrape that content.
Platforms may claim that violates their terms, but that does not automatically mean your own expectations of consent are reflected or enforced.

You might technically “agree” to platform terms of service, but those documents are rarely explicit that your life’s output is fodder for any AI system that can crawl it.

Everyday Users: You Are the Product and the Training Data

Consent issues do not stop at web scraping. Many AI products also learn directly from your interactions.

OpenAI explains in its own documentation that conversations in ChatGPT may be used to improve its models unless you explicitly opt out in settings or use enterprise products with different data handling. OpenAI Help Center

Other tools, from productivity assistants to AI writing shortcuts, often use similar “improve our models” language:

You paste drafts of sensitive emails.
You brainstorm business ideas.
You upload documents for summarization.

Unless you carefully read privacy policies and toggle settings, those inputs might be retained and analyzed for training or evaluation. That is not necessarily nefarious – learning from usage can fix bugs and reduce bias – but again, meaningful consent is the missing piece. Most people assume “I’m using a tool,” not “I’m helping train a product used by millions.”

Some vendors are responding with stricter guarantees. Enterprise and “pro” tiers of systems like ChatGPT, Claude, and Gemini emphasize that user data will not be used to train general models. But for everyday free users, the default is often still: opt out if you notice the setting.

Regulators Start to Ask: Did Anyone Agree to This?

Privacy and data protection regulators, especially in Europe, are starting to call this out directly.

In March 2023, Italy’s data protection authority temporarily banned ChatGPT from processing the personal data of people in Italy, citing concerns about unlawful data collection and lack of age checks. The regulator argued that data was being gathered and processed without a proper legal basis under the EU’s GDPR and that users were not adequately informed. Data Protection Report

The broader debate in Europe now includes:

Whether scraping publicly available data (like photos, posts, or forum comments) counts as “consent” or “legitimate interest.”
How AI-specific rules, such as the EU AI Act, should require transparency about which training data was used, especially for copyrighted works and images.
If people should have a “right to be excluded” from training data, similar to a right to be forgotten.

Even where regulators act, though, the technical reality is sobering: once a model has been trained on data, you cannot easily “un‑train” a specific image or sentence. Deleting a source file does not magically remove its traces from all the AI systems that absorbed it.

You might wonder: “Does any of this really hurt me, personally?” It can, in several ways.

Economic harm
If you are a writer, artist, coder, musician, or journalist, AI systems trained on your work can now compete with you. Media organizations and creators worry about losing traffic and income when AI answers replace visits to the original sources.
Privacy and safety risks
Photos of you or your children, or posts under pseudonyms, may end up influencing models that others query in unpredictable ways. Research has shown that models can sometimes regurgitate training data fragments, raising concerns about private or sensitive details leaking.
Loss of control over identity and style
Tools can mimic artistic or writing styles with just a few prompts. Many creators see this as a direct extension of themselves being copied – and they were never asked if they were okay with that.
Norm-setting for the future
The way we resolve the current consent crisis will set expectations for other emerging tech: biometric systems, synthetic media, personalized ads, and more.

This is not just a legal or technical debate. It is about what kind of relationship you want with the systems that increasingly mediate your information, creativity, and social life.

So What Can You Actually Do?

You cannot single‑handedly rewrite copyright law or force every AI company to change its practices, but you are not powerless. There are concrete steps you can take now.

Use privacy controls and opt-outs
- Check settings in tools like ChatGPT, Claude, and Gemini to see whether your conversations are used for training and switch that off if you are uncomfortable. For example, OpenAI lets you disable training on your ChatGPT history in its settings.
- If you are a creator using specific platforms (portfolio sites, art communities, code hosting), look for AI opt-out options: “noAI” metadata, special tags, or site‑level controls.
Be intentional about what you publish where
- Treat fully public posts (especially on your own domain with no login wall) as highly likely to be scraped.
- Consider private or limited-audience spaces for sensitive content, kids’ photos, or works you are not comfortable seeing in training sets.
Support organizations and policies pushing for better consent
- Follow and, if you like, support creator lawsuits or advocacy groups arguing for opt-in or at least robust opt-out mechanisms for AI training.
- Pay attention to national and regional AI regulations, and speak up in consultations or through professional associations if you work in a field affected by AI.

The consent crisis in AI will not be solved overnight. But the more you understand how your data is used – and the more you insist on clear, meaningful choices – the harder it becomes for AI systems to treat you as just another invisible input in someone else’s model.

Read other posts

< [AI And Informed Consent: Why "I Agree" Almost Never Means What You Think ]