AI Trained on YOUR Posts – Fair or FOUL?

Reddit files lawsuit against AI company Anthropic for allegedly stealing user comments to train its chatbot Claude, raising new questions about data ownership in the age of artificial intelligence.

At a Glance 

  • Reddit has sued Anthropic for allegedly using automated bots to scrape user comments without permission
  • The social media platform already has licensing agreements with Google and OpenAI that include user protections
  • Anthropic, founded by former OpenAI executives, maintains its training methods are lawful
  • The case was filed in California and focuses on breach of terms of use rather than copyright infringement
  • The lawsuit highlights growing tensions over AI companies’ use of internet content

Reddit Takes Legal Action Against Unauthorized Data Collection

Reddit has filed a lawsuit against artificial intelligence company Anthropic, alleging the AI firm used automated bots to “scrape” user comments from its platform to train the Claude chatbot. The lawsuit, filed in the superior court of California in San Francisco, claims Anthropic accessed Reddit content despite explicit requests not to do so. This legal battle marks another significant clash between content platforms and AI developers over who controls the data that powers increasingly sophisticated AI systems. 

Reddit’s lawsuit specifically focuses on breach of terms of use and unfair competition rather than copyright infringement. The platform argues that Anthropic’s actions undermine its ability to maintain control over how user-generated content is utilized. As AI systems become more prevalent, the question of data ownership has moved from theoretical discussions to courtroom battles, with Reddit now taking a firm stance against unauthorized collection of its users’ content. 

Reddit’s Existing AI Partnerships Versus Anthropic’s Approach

The social media platform has already established licensing agreements with other major AI developers including Google and OpenAI, allowing them to train their systems on Reddit content. These partnerships include specific protections for users, such as the right to delete content, privacy safeguards, and measures to prevent spam. The agreements have also proven financially beneficial for Reddit, particularly ahead of its public trading debut, with OpenAI CEO Sam Altman being a significant shareholder. 

“AI companies should not be allowed to scrape information and content from people without clear limitations on how they can use that data,” said Ben Lee, Reddit’s chief legal officer. 

Anthropic, founded by former OpenAI executives, has emerged as a competitor to OpenAI’s ChatGPT and has partnered with Amazon for its Alexa assistant. According to a 2021 paper by Anthropic, the company identified high-quality AI training data subreddits, suggesting it had been analyzing Reddit content for some time. The company has used websites like Wikipedia and Reddit as sources for AI training data, but now faces legal challenges for its collection methods. 

Debate Over Lawful Use of Internet Data

Anthropic maintains that its training methods are lawful, citing a letter it sent to the U.S. Copyright Office. This position highlights the ongoing legal uncertainty surrounding AI training practices. While many AI companies argue that scraping publicly available data falls under fair use, content platforms and creators increasingly challenge this interpretation, seeing it as exploitation of their work without compensation or consent. 

This lawsuit represents a growing trend of content platforms asserting control over their data in the AI era. Other major news organizations and content creators have established licensing agreements with AI companies, including The Associated Press, which has an agreement with OpenAI for access to its text archives. As this case progresses, it may establish important precedents for how AI companies can legally access and use internet content for training their systems.