The Co-opting of Safety

Author: Jacob Haimes and Igor Krawczuk
Published: Thu 21 Aug 2025
Episode Link: https://kairos.fm/muckraikers/e016

We dig into how the concept of AI "safety" has been co-opted and weaponized by tech companies. Starting with examples like Mecha-Hitler Grok, we explore how real safety engineering differs from AI "alignment," the myth of the alignment tax, and why this semantic confusion matters for actual safety.

(00:00) - Intro

(00:21) - Mecha-Hitler Grok

(10:07) - "Safety"

(19:40) - Under-specification

(53:56) - This time isn't different

(01:01:46) - Alignment Tax myth

(01:17:37) - Actually making AI safer

Links

JMLR article - Underspecification Presents Challenges for Credibility in Modern Machine Learning
Trail of Bits paper - Towards Comprehensive Risk Assessments and Assurance of AI-Based Systems
SSRN paper - Uniqueness Bias: Why It Matters, How to Curb It

Additional Referenced Papers

NeurIPS paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
ICML paper - AI Control: Improving Safety Despite Intentional Subversion
ICML paper - DarkBench: Benchmarking Dark Patterns in Large Language Models
OSF preprint - Current Real-World Use of Large Language Models for Mental Health
Anthropic preprint - Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Inciting Examples

ars Technica article - US government agency drops Grok after MechaHitler backlash, report says
The Guardian article - Musk’s AI Grok bot rants about ‘white genocide’ in South Africa in unrelated chats
BBC article - Update that made ChatGPT 'dangerously' sycophantic pulled

Other Sources

London Daily article - UK AI Safety Institute Rebrands as AI Security Institute to Focus on Crime and National Security
Vice article - Prominent AI Philosopher and ‘Father’ of Longtermism Sent Very Racist Email to a 90s Philosophy Listserv
LessWrong blogpost - "notkilleveryoneism" sounds dumb (see comments)
EA Forum blogpost - An Overview of the AI Safety Funding Situation
Book by Dmitry Chernov and Didier Sornette - Man-made Catastrophes and Risk Information Concealment
Euronews article - OpenAI adds mental health safeguards to ChatGPT, saying chatbot has fed into users’ ‘delusions’
Pleias website
Wikipedia page on Jaywalking

Share to:

EachPod

EachPod

The Co-opting of Safety