LessWrong (30+ Karma)

Audio narrations of LessWrong posts.

Philosophy Society & Culture Technology

Update frequency: every day
Average duration: 18 minutes
Episodes: 583
Years Active: 2025

“Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models” by James Chua, Owain_Evans

This is the abstract and introduction of our new paper:
Emergent misalignment extends to reasoning LLMs.
Reasoning models resist being shut down and plot deception against users in their chain-of-t…

00:18:58 | Tue 17 Jun 2025

“Why we’re still doing normal school” by juliawise

Cross-posted from Otherwise.
Caveats: My oldest child is 11, and I don’t have parenting experience beyond elementary school. We’re lucky that our local public school is a good fit for our kids, and …

00:05:05 | Tue 17 Jun 2025

[Linkpost] “the void” by nostalgebraist

This is a link post.

A very long essay about LLMs, the nature and history of the the HHH assistant persona, and the implications for alignment.

Multiple people have asked me whether I could post thi…

00:01:15 | Wed 11 Jun 2025

“Expectation = intention = setpoint” by jimmy

When I was first learning about hypnosis, one of the things that was very confusing to me is how "expectations" relate to "intent". Some hypnotists would say "All suggestion is about expectation; if…

00:21:53 | Wed 11 Jun 2025

“Give Me a Reason(ing Model)” by Zvi

Are we doing this again? It looks like we are doing this again. This time it involves giving LLMs several ‘new’ tasks including effectively a Tower of Hanoi problem, asking them to specify the answe…

00:11:48 | Tue 10 Jun 2025

“Mech interp is not pre-paradigmatic” by Lee Sharkey

This is a blogpost version of a talk I gave earlier this year at GDM.

Epistemic status: Vague and handwavy. Nuance is often missing. Some of the claims depend on implicit definitions that may be r…

00:29:34 | Tue 10 Jun 2025

“The True Goal Fallacy” by adamShimi

As I ease out into a short sabbatical, I find myself turning back to dig the seeds of my repeated cycle of exhaustion and burnout in the last few years.

Many factors were at play, some more personal…

00:13:20 | Tue 10 Jun 2025

“Ghiblification for Privacy” by jefftk

I often want to include an image in my posts to give a sense of a situation. A photo communicates the most, but sometimes that's too much: some participants would rather remain anonymous. A friend…

00:02:07 | Tue 10 Jun 2025

Error rendering URL

---

Source:
https://www.lesswrong.com/posts/HKCKinBgsKKvjQyWK/read-the-pricing-first

---

Narrated by TYPE III AUDIO.

00:00:16 | Tue 10 Jun 2025

“When is it important that open-weight models aren’t released? My thoughts on the benefits and dangers of open-weight models in response to developments in CBRN capabilities.” by ryan_greenblatt

Recently, Anthropic released Opus 4 and said they couldn't rule out the model triggering ASL-3 safeguards due to the model's CBRN capabilities. That is, they say they couldn't rule out that this mod…

00:17:12 | Mon 09 Jun 2025

“Administering immunotherapy in the morning seems to really, really matter. Why?” by Abhishaike Mahajan

Edit on 08/06/2024: At least one person has pointed out that, at one point, giving hypertensives at night were also thought to matter, a now disproven idea. Someone also mentioned how many times the…

00:22:43 | Mon 09 Jun 2025

[Linkpost] “METR: Recent frontier models are reward hacking” by Daniel Kokotajlo

This is a link post.

METR just made a lovely post detailing many examples they've found of reward hacks by frontier models. Unlike the reward hacks of yesteryear, these models are smart enough to kno…

00:00:44 | Mon 09 Jun 2025

“Levels of Doom: Eutopia, Disempowerment, Extinction” by Vladimir_Nesov

Disempowerment is on the fence, gets interpreted as either implying human extinction or being a good place. "Doom" tends to be ambiguous between disempowerment and extinction, as well as about when …

00:04:09 | Mon 09 Jun 2025

“AI companies’ eval reports mostly don’t support their claims” by Zach Stein-Perlman

AI companies claim that their models are safe on the basis of dangerous capability evaluations. OpenAI, Google DeepMind, and Anthropic publish reports intended to show their eval results and explain …

00:08:08 | Mon 09 Jun 2025

“Busking with Kids” by jefftk

Our older two, ages 11 and 9, have been learning fiddle, and are getting pretty good at it. When the weather's nice we'll occasionally go play somewhere public for tips ("busking"). It's better th…

00:02:45 | Mon 09 Jun 2025

“Emergent Misalignment on a Budget” by Valerio Pepe

TL;DR We reproduce emergent misalignment (Betley et al. 2025) in Qwen2.5-Coder-32B-Instruct using single-layer LoRA finetuning, showing that tweaking even one layer can lead to toxic or insecure out…

00:17:19 | Mon 09 Jun 2025

“Letting Kids Be Outside” by jefftk

When our kids were 7 and 5 they started walking home from school alone. We wrote explaining they were ready and giving permission, the school had a few reasonable questions, and that was it. Just …

00:08:18 | Sun 08 Jun 2025

“On working 80%” by adrische

A year ago, I decided to reduce my employment level from 100% to 80% and to take Fridays off.

My main motivation was to have some time for myself: Relax, reduce my stress level from work, have more …

00:05:24 | Sun 08 Jun 2025

“Solo Park Play at Three” by jefftk

Our three year old is about to turn four, and is bursting with a desire for independence. She's becoming more capable in all sorts of ways, and wants me to back off and let her do things. Today …

00:02:22 | Sat 07 Jun 2025

“The Mirror Trap” by Cameron Berg

A quick post on a probably-real inadequate equilibrium mostly inspired by trying to think through what happened to Chance the Rapper.

Potentially ironic artifact if it accrues karma.

1. The sculpto…

00:09:06 | Sat 07 Jun 2025

Disclaimer: The podcast and artwork embedded on this page are the property of LessWrong ([email protected]). This content is not affiliated with or endorsed by eachpod.com.

EachPod