LessWrong (30+ Karma)

Audio narrations of LessWrong posts.

Philosophy Society & Culture Technology

Update frequency: every day
Average duration: 18 minutes
Episodes: 583
Years Active: 2025

“Sydney Bing Wikipedia Article: Sydney (Microsoft Prometheus)” by jdp

I wrote this page for Wikipedia about the Sydney Bing incident. Since I have limited control over what happens to it in the long term and it's entirely authored by myself I release the final version…

00:13:32 | Mon 28 Jul 2025

“Maya’s Escape” by Bridgett Kay

Maya did not believe she lived in a simulation. She knew that her continued hope that she could escape from the nonexistent simulation was based on motivated reasoning. She said this to herself in t…

00:20:25 | Sun 27 Jul 2025

[Linkpost] “The Purpose of a System is what it Rewards” by robotelvis

This is a link post.

It's become fashionable recently to say that the purpose of a system is what it does - the true purpose of an institution is often different from what it publicly claims, and is …

00:03:07 | Sun 27 Jul 2025

“my experience on glp-1s as a thin person” by AnnaJo

Warning: This is an experiment log, I’m not advising you to start taking Retatrutide. I wish that there were more logs about people's experiences on peptides, so here's mine in case others find it h…

00:17:27 | Sat 26 Jul 2025

“Anthropic Faces Potentially ‘Business-Ending’ Copyright Lawsuit” by garrison

A class action over pirated books exposes the 'responsible' AI company to penalties that could bankrupt it — and reshape the entire industry

This is the full text of a post first published on Obsole…

00:14:06 | Sat 26 Jul 2025

“HPMOR: The (Probably) Untold Lore” by Gretta Duleba, Eliezer Yudkowsky

Eliezer and I love to talk about writing. We talk about our own current writing projects, how we’d improve the books we’re reading, and what we want to write next. Sometimes along the way I learn so…

01:07:33 | Fri 25 Jul 2025

“We Built a Tool to Protect Your Dataset From Simple Scrapers” by TurnTrout, Edward Turner, Dipika Khullar

Summary: We introduce a command-line tool for hardening datasets against less sophisticated scrapers.

Author: Alex Turner. Contributors: Dipika Khullar, Ed Turner, and Roy Rinberg.

Dataset contamina…

00:06:24 | Fri 25 Jul 2025

[Linkpost] “Reasoning-Finetuning Repurposes Latent Representations in Base Models” by Jake Ward, lccqqqqq, Neel Nanda

This is a link post.

Authors: Jake Ward*, Chuqiao Lin*, Constantin Venhoff, Neel Nanda (*Equal contribution). This work was completed during Neel Nanda's MATS 8.0 Training Phase.

TL;DR

We computed a…

00:05:50 | Fri 25 Jul 2025

“Building and evaluating alignment auditing agents” by Sam Marks, Sam Bowman, Euan Ong, Johannes Treutlein, evhub

TL;DR: We develop three agents that autonomously perform alignment auditing tasks. When tested against models with intentionally-inserted alignment issues, our agents successfully uncover an LLM's h…

00:11:21 | Thu 24 Jul 2025

“The Whole Check” by JustisMills

This is a cross-post from my blog; historically, I've cross-posted about a square rooth of my posts here. First two sections are likely to be familiar concepts to LessWrong readers, though I don't t…

00:06:36 | Thu 24 Jul 2025

“‘Behaviorist’ RL reward functions lead to scheming” by Steven Byrnes

1. Introduction & tl;dr

This post is basically a 5x-shorter version of Self-dialogue: Do behaviorist rewards make scheming AGIs? (Feb 2025).[1]

1.1 tl;dr

I will argue that a large class of reward funct…

00:21:29 | Thu 24 Jul 2025

[Linkpost] “A brief perspective from an IMO coordinator” by DirectedEvolution

This is a link post.

I would be somewhat skeptical about any claims suggesting that results have been verified in some form by coordinators. At the closing party, AI company representatives were, dis…

00:01:32 | Wed 23 Jul 2025

“Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning” by kh4dien, Helena Casademunt, Adam Karvonen, Sam Marks, Senthooran Rajamanoharan, Neel Nanda

Summary

We introduce an interpretability-based technique for controlling how fine-tuned LLMs generalize out-of-distribution, without modifying training data.
We show it can mitigate emergent misali…

00:12:18 | Wed 23 Jul 2025

“On ‘ChatGPT Psychosis’ and LLM Sycophancy” by jdp

As a person who frequently posts about large language model psychology I get an elevated rate of cranks and schizophrenics in my inbox. Often these are well meaning people who have been spooked by t…

00:30:06 | Wed 23 Jul 2025

“Google and OpenAI Get 2025 IMO Gold” by Zvi

Congratulations, as always, to everyone who got to participate in the 2025 International Mathematical Olympiad, and especially to the gold and other medalists. Gautham Kamath highlights 11th grader …

00:58:13 | Wed 23 Jul 2025

“Unfaithful chain-of-thought as nudged reasoning” by Paul Bogdan, Uzay Macar, Arthur Conmy, Neel Nanda

This piece is based on work conducted during MATS 8.0 and is part of a broader aim of interpreting chain-of-thought in reasoning models.

tl;dr

Research on chain-of-thought (CoT) unfaithfulness shows…

00:19:53 | Wed 23 Jul 2025

“Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data” by cloud, mle, Owain_Evans

Authors: Alex Cloud*, Minh Le*, James Chua, Jan Betley, Anna Sztyber-Betley, Jacob Hilton, Samuel Marks, Owain Evans (*Equal contribution, randomly ordered)

tl;dr. We study subliminal learning, a su…

00:10:01 | Tue 22 Jul 2025

“Directly Try Solving Alignment for 5 weeks” by Kabir Kumar

The Moonshot Alignment Program is a 5-week research sprint from August 2nd to September 6th, focused on the hard part of alignment: finding methods to get an AI to do what we want and not what don't…

00:13:35 | Tue 22 Jul 2025

[Linkpost] “Why Reality Has A Well-Known Math Bias” by Linch

This is a link post.

I've written up a post offering my take on the "unreasonable effectiveness of mathematics." My core argument is that we can potentially resolve Wigner's puzzle by applying an ant…

00:03:08 | Tue 22 Jul 2025

“Do ‘adult developmental stages’ theories have any pre-theoretical motivation?” by Said Achmiz

(This is a comment that has been turned into a post.)

I have seen much talk on Less Wrong of “development stages” and “Kegan” and so forth. Naturally I am skeptical; so I do endorse any attempt to …

00:06:25 | Tue 22 Jul 2025

Disclaimer: The podcast and artwork embedded on this page are the property of LessWrong ([email protected]). This content is not affiliated with or endorsed by eachpod.com.

EachPod

EachPod

LessWrong (30+ Karma)

“Sydney Bing Wikipedia Article: Sydney (Microsoft Prometheus)” by jdp

“Maya’s Escape” by Bridgett Kay

[Linkpost] “The Purpose of a System is what it Rewards” by robotelvis

“my experience on glp-1s as a thin person” by AnnaJo

“Anthropic Faces Potentially ‘Business-Ending’ Copyright Lawsuit” by garrison

“HPMOR: The (Probably) Untold Lore” by Gretta Duleba, Eliezer Yudkowsky

“We Built a Tool to Protect Your Dataset From Simple Scrapers” by TurnTrout, Edward Turner, Dipika Khullar

[Linkpost] “Reasoning-Finetuning Repurposes Latent Representations in Base Models” by Jake Ward, lccqqqqq, Neel Nanda

“Building and evaluating alignment auditing agents” by Sam Marks, Sam Bowman, Euan Ong, Johannes Treutlein, evhub

“The Whole Check” by JustisMills

“‘Behaviorist’ RL reward functions lead to scheming” by Steven Byrnes

1. Introduction & tl;dr

1.1 tl;dr

[Linkpost] “A brief perspective from an IMO coordinator” by DirectedEvolution

“Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning” by kh4dien, Helena Casademunt, Adam Karvonen, Sam Marks, Senthooran Rajamanoharan, Neel Nanda

“On ‘ChatGPT Psychosis’ and LLM Sycophancy” by jdp

“Google and OpenAI Get 2025 IMO Gold” by Zvi

“Unfaithful chain-of-thought as nudged reasoning” by Paul Bogdan, Uzay Macar, Arthur Conmy, Neel Nanda

tl;dr

“Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data” by cloud, mle, Owain_Evans

“Directly Try Solving Alignment for 5 weeks” by Kabir Kumar

[Linkpost] “Why Reality Has A Well-Known Math Bias” by Linch

“Do ‘adult developmental stages’ theories have any pre-theoretical motivation?” by Said Achmiz