LessWrong (30+ Karma)

Audio narrations of LessWrong posts.

Technology Philosophy Society & Culture

Update frequency: every day
Average duration: 18 minutes
Episodes: 577
Years Active: 2025

“METR’s Evaluation of GPT-5” by GradientDissenter

METR (where I work, though I'm cross-posting in a personal capacity) evaluated GPT-5 before it was externally deployed. We performed a much more comprehensive safety analysis than we ever have befor…

00:48:29 | Thu 07 Aug 2025

“Civil Service: a Victim or a Villain?” by Martin Sustrik

Dominic Cummings and Jennifer Pahlka are both unhappy about the civil service. However, they have different understandings of what the problem is and how it should be solved.

Dominic is a politician…

00:07:58 | Thu 07 Aug 2025

“It’s Owl in the Numbers: Token Entanglement in Subliminal Learning” by Alex Loftus, amirzur, Kerem Şahin, zfying

By Amir Zur (Stanford), Alex Loftus (Northeastern), Hadas Orgad (Technion), Zhuofan Josh Ying (Columbia, CBAI), Kerem Sahin (Northeastern), and David Bau (Northeastern)

Links: Interactive Demo | Co…

00:10:35 | Thu 07 Aug 2025

“No, Rationalism Is Not a Cult” by Liam Robins

(I realize I'm preaching to the choir by posting this here. But I figure it's good to post it regardless.)

Introduction

Recently, Scott Alexander gave a list of tight-knit communities with strong va…

00:20:13 | Thu 07 Aug 2025

“Interview with Kelsey Piper on Self-Censorship and the Vibe Shift” by Zack_M_Davis

On 17 July 2025, I sat down with Kelsey Piper to chat about politics and social epistemology. You can listen to the audio file, or read the transcript below, which has been edited for clarity.

Post…

00:25:46 | Thu 07 Aug 2025

“Claude, GPT, and Gemini All Struggle to Evade Monitors” by Vincent Cheng, Thomas Kwa

This work was done while at METR.

Introduction

GDM recently released a paper (Emmons et al.) showing that, contrary to previous results, the chain-of-thought (CoT) of language models can be more fai…

00:12:03 | Thu 07 Aug 2025

“Opus 4.1 Is An Incremental Improvement” by Zvi

Claude Opus 4 has been updated to Claude Opus 4.1.

This is a correctly named incremental update, with the bigger news being ‘we plan to release substantially larger improvements to our models in th…

00:13:36 | Thu 07 Aug 2025

“Re: Recent Anthropic Safety Research” by Eliezer Yudkowsky

A reporter asked me for my off-the-record take on recent safety research from Anthropic. After I drafted an off-the-record reply, I realized that I was actually fine with it being on the record, so:

…

00:09:01 | Wed 06 Aug 2025

“Inscrutability was always inevitable, right?” by Steven Byrnes

Here's a 2022 Eliezer Yudkowsky tweet:

In context, “secure” means “secure against jailbreaks”. Source. H/t Cole Wyeth here.

I find this confusing.

Here's a question: are object-level facts about the…

00:04:17 | Wed 06 Aug 2025

“Statistical takes for mech interp research and beyond” by Paul Bogdan

I am currently a MATS 8.0 scholar studying mechanistic interpretability with Neel Nanda. I’m also a postdoc in psychology/neuroscience. My perhaps most notable paper analyzed the last 20 years of ps…

00:31:06 | Wed 06 Aug 2025

[Linkpost] “OpenAI Releases gpt-oss” by anaguma

This is a link post.

Introduction

We’re releasing gpt-oss-120b and gpt-oss-20b—two state-of-the-art open-weight language models that deliver strong real-world performance at low cost. Available under…

00:03:53 | Wed 06 Aug 2025

“Childhood and Education #13: College” by Zvi

There's a time and a place for everything. It used to be called college.

Table of Contents

The Big Test.
Testing, Testing.
Legalized Cheating On the Big Test.
What Happens When You Don’t …

00:44:51 | Wed 06 Aug 2025

“The perils of under- vs over-sculpting AGI desires” by Steven Byrnes

1. Summary and Table of Contents

1.1 Summary

In the context of “brain-like AGI”, a yet-to-be-invented variation on actor-critic model-based reinforcement learning (RL), there's a ground-truth reward f…

00:45:24 | Wed 06 Aug 2025

“The Problem” by Rob Bensinger, tanagrabeast, yams, So8res, Eliezer Yudkowsky, Gretta Duleba

This is a new introduction to AI as an extinction threat, previously posted to the MIRI website in February alongside a summary. It was written independently of Eliezer and Nate's forthcoming book, …

00:49:33 | Tue 05 Aug 2025

“Concept Poisoning: Probing LLMs without probes” by Jan Betley, jorio, dylan_f, Owain_Evans

This post describes concept poisoning, a novel LLM evaluation technique we’ve been researching for the past couple months. We’ve decided to move to other things. Here we describe the idea, some of o…

00:32:42 | Tue 05 Aug 2025

“Narrow finetuning is different” by cloud, Stewy Slocum

Epistemic status: an informal note.

It is common to use finetuning on a narrow data distribution, or narrow finetuning (NFT), to study AI safety. In these experiments, a model is trained on a very s…

00:06:50 | Tue 05 Aug 2025

“On Altman’s Interview With Theo Von” by Zvi

Sam Altman talked recently to Theo Von.

Double click to interact with video

Theo is genuinely engaging and curious throughout. This made me want to consider listening to his podcast more. I’d…

00:16:58 | Tue 05 Aug 2025

“Interview with Steven Byrnes on Brain-like AGI, Foom & Doom, and Solving Technical Alignment” by Liron, Steven Byrnes

Dr. @Steven Byrnes is one of the few people who both understands why alignment is hard, and is taking a serious technical shot at solving it. He's the author of these recently popular posts:

Foom &…

02:34:06 | Tue 05 Aug 2025

“Towards Alignment Auditing as a Numbers-Go-Up Science” by Sam Marks

Thanks to Rowan Wang and Buck Shlegeris for feedback on a draft.

What is the job of an alignment auditing researcher? In this post, I propose the following answer: to build tools which increase audi…

00:18:10 | Mon 04 Aug 2025

“Alcohol is so bad for society that you should probably stop drinking” by KatWoods

This is a cross post written by Andy Masley, not me. I found it really interesting and wanted to see what EAs/rationalists thought of his arguments.

This post was inspired by similar posts by Tyler…

00:15:33 | Mon 04 Aug 2025

Disclaimer: The podcast and artwork embedded on this page are the property of LessWrong ([email protected]). This content is not affiliated with or endorsed by eachpod.com.

EachPod

EachPod

LessWrong (30+ Karma)

“METR’s Evaluation of GPT-5” by GradientDissenter

“Civil Service: a Victim or a Villain?” by Martin Sustrik

“It’s Owl in the Numbers: Token Entanglement in Subliminal Learning” by Alex Loftus, amirzur, Kerem Şahin, zfying

“No, Rationalism Is Not a Cult” by Liam Robins

“Interview with Kelsey Piper on Self-Censorship and the Vibe Shift” by Zack_M_Davis

“Claude, GPT, and Gemini All Struggle to Evade Monitors” by Vincent Cheng, Thomas Kwa

“Opus 4.1 Is An Incremental Improvement” by Zvi

“Re: Recent Anthropic Safety Research” by Eliezer Yudkowsky

“Inscrutability was always inevitable, right?” by Steven Byrnes

“Statistical takes for mech interp research and beyond” by Paul Bogdan

[Linkpost] “OpenAI Releases gpt-oss” by anaguma

“Childhood and Education #13: College” by Zvi

“The perils of under- vs over-sculpting AGI desires” by Steven Byrnes

1. Summary and Table of Contents

1.1 Summary

“The Problem” by Rob Bensinger, tanagrabeast, yams, So8res, Eliezer Yudkowsky, Gretta Duleba

“Concept Poisoning: Probing LLMs without probes” by Jan Betley, jorio, dylan_f, Owain_Evans

“Narrow finetuning is different” by cloud, Stewy Slocum

“On Altman’s Interview With Theo Von” by Zvi

“Interview with Steven Byrnes on Brain-like AGI, Foom & Doom, and Solving Technical Alignment” by Liron, Steven Byrnes

“Towards Alignment Auditing as a Numbers-Go-Up Science” by Sam Marks

“Alcohol is so bad for society that you should probably stop drinking” by KatWoods