AI Positive

Author: Yellowgold Studios
Published: Wed 24 Jan 2024
Episode Link: https://aiinside.show/episode/ai-positive

On the premiere episode of the AI Inside podcast, hosts Jeff Jarvis and Jason Howell discuss AI copyright issues with Common Crawl Foundation's Rich Skrenta regarding news outlets limiting access to content they publish publicly, impacting the integrity of Common Crawl's internet archive. In recent years, the archive has been used by LLMs as AI training data, and the implications of restricting information have a dramatic impact on the data quality that survives.

INTERVIEW

Introduction and background on AI Inside podcast

Discussion of the recent AI oversight Senate hearing Jeff testified at

Introduction of guest Rich Skrenta from Common Crawl Foundation

Overview of Common Crawl and its goals to archive the open web

Discussion of how Common Crawl data is used to train AI models

News publishers wanting content removed from Common Crawl

Debate around copyright, fair use, and AI’s “right to read”

Mechanics of how Common Crawl works and what it archives

Concerns about restricting AI access to data for training

Risk of regulatory capture and only big companies being able to use AI

Discussion of recent court ruling related to web scraping

Hopes for Common Crawl's growth and evolution

NEWS BITES

Interesting device announcement from CES - Rabbit R1 with Perplexity AI integration

Study on actual risk of AI automating jobs away in the near future

Learn more about your ad choices. Visit megaphone.fm/adchoices

Share to:

EachPod

EachPod

AI Positive