Bad Vibes: Access to AI Training Data Sparks Legal Questions

Published on June 26, 2025

As “vibe coding”—relying on AI copilots to write code based on natural language prompts—gains popularity, tech companies are racing to build expansive and reliable knowledge bases to train these AI tools. Central to this effort are platforms like Reddit and Stack Overflow, which host vast amounts of user-generated technical content. However, disputes have emerged over data access, with Reddit suing Anthropic for allegedly scraping its site without permission, while Stack Overflow is pursuing formal partnerships, such as a licensed data-sharing deal with Snowflake. These contrasting approaches highlight a broader shift: web communities are asserting greater control over their content as AI firms seek high-quality data, emphasizing that monetized, licensed access—and respect for user privacy—are now prerequisites in the AI era.

Read the full article here.