LessWrong (30+ Karma)

LessWrong ([email protected])

Audio narrations of LessWrong posts.

Philosophy Society & Culture Technology

Update frequency: every day
Average duration: 18 minutes
Episodes: 583
Years Active: 2025

Share to:

[Linkpost] “Research Note: Our scheming precursor evals had limited predictive power for our in-context scheming evals” by Marius Hobbhahn

This is a link post.

Note: This is a research note, and the analysis is less rigorous than our standard for a published paper. We’re sharing these findings because we think they might be valuable for…

00:03:28 | Thu 03 Jul 2025

“Call for suggestions - AI safety course” by boazbarak

In the fall I am planning to teach an AI safety graduate course at Harvard. The format is likely to be roughly similar to my "foundations of deep learning" course.

I am still not sure of the content…

00:02:32 | Thu 03 Jul 2025

[Linkpost] “IABIED: Advertisement design competition” by yams

This is a link post.

We’re currently in the process of locking in advertisements for the September launch of If Anyone Builds It, Everyone Dies, and we’re interested in your ideas! If you have graphi…

00:02:20 | Thu 03 Jul 2025

“Congress Asks Better Questions” by Zvi

Back in May I did a dramatization of a key and highly painful Senate hearing. Now, we are back for a House committee meeting. It was entitled ‘Authoritarians and Algorithms: Why U.S. AI Must Lead’ a…

00:30:28 | Thu 03 Jul 2025

“Curing PMS with Hair Loss Pills” by David Lorell

Over the last two years or so, my girlfriend identified her cycle as having a unusually strong and very predictable effect on her mood/affect. We tried a bunch of interventions (food, sleep, sociali…

00:16:08 | Wed 02 Jul 2025

“AI Task Length Horizons in Offensive Cybersecurity” by Sean Peters

This is a rough research note where the primary objective was my own learning. I am sharing it because I’d love feedback and I thought the results were interesting.

Introduction

A recent METR paper …

00:23:34 | Wed 02 Jul 2025

“Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild” by Adam Karvonen, Sam Marks

Summary: We found that LLMs exhibit significant race and gender bias in realistic hiring scenarios, but their chain-of-thought reasoning shows zero evidence of this bias. This serves as a nice examp…

00:07:57 | Wed 02 Jul 2025

“There are two fundamentally different constraints on schemers” by Buck

People (including me) often say that scheming models “have to act as if they were aligned”. This is an alright summary; it's accurate enough to use when talking to a lay audience. But if you want …

00:07:29 | Wed 02 Jul 2025

“‘What’s my goal?’” by Raemon

The first in a series of bite-sized rationality prompts[1].

This is my most common opening-move for Instrumental Rationality. There are many, many other pieces of instrumental rationality. But ask…

00:04:27 | Wed 02 Jul 2025

“A Simple Explanation of AGI Risk” by TurnTrout

Notes from a talk originally given at my alma mater

I went to Grinnell College for my undergraduate degree. For the 2025 reunion event, I agreed to speak on a panel about AI. I like the talk I gave …

00:09:55 | Wed 02 Jul 2025

“AI Moratorium Stripped From BBB” by Zvi

The insane attempted AI moratorium has been stripped from the BBB. That doesn’t mean they won’t try again, but we are good for now. We should use this victory as an opportunity to learn. Here's what…

00:10:11 | Tue 01 Jul 2025

“Authors Have a Responsibility to Communicate Clearly” by TurnTrout

When a claim is shown to be incorrect, defenders may say that the author was just being “sloppy” and actually meant something else entirely. I argue that this move is not harmless, charitable, or hea…

00:34:13 | Tue 01 Jul 2025

“Scientific Discovery in the Age of Artificial Intelligence” by Jessica Rumbelow

For many people, including me, the real promise of AI is massively accelerated scientific discovery. Chatbots, vibe coding, video generation: these things are magical, but what I really want is supe…

00:20:21 | Tue 01 Jul 2025

“SLT for AI Safety” by Jesse Hoogland

This sequence draws from a position paper co-written with Simon Pepin Lehalleur, Jesse Hoogland, Matthew Farrugia-Roberts, Susan Wei, Alexander Gietelink Oldenziel, Stan van Wingerden, George Wang, …

00:07:31 | Tue 01 Jul 2025

“The best simple argument for Pausing AI?” by Gary Marcus

Not saying we should pause AI, but consider the following argument:

Alignment without the capacity to follow rules is hopeless. You can’t possibly follow laws like Asimov's Laws (or better alternat…

00:02:01 | Tue 01 Jul 2025

“SAE on activation differences” by Santiago Aranguri, jacob_drori, Neel Nanda

TLDR: we find that SAEs trained on the difference in activations between a base model and its instruct finetune are a valuable tool for understanding what changed during finetuning.

This work is the…

00:10:54 | Tue 01 Jul 2025

“What We Learned Trying to Diff Base and Chat Models (And Why It Matters)” by Clément Dumas, Julian Minder, Neel Nanda

This post presents some motivation on why we work on model diffing, some of our first results using sparse dictionary methods and our next steps. This work was done as part of the MATS 7 extension. …

00:19:37 | Mon 30 Jun 2025

“If you want to be vegan but you worry about health effects of no meat, consider being vegan except for mussels/oysters” by KatWoods

1) They're unlikely to be sentient (few neurons, immobile)

2) If they are sentient, the farming practices look likely to be pretty humane

3) They're extremely nutritionally dense

Buying canned smoke…

00:01:06 | Mon 30 Jun 2025

[Linkpost] “Project Vend: Can Claude run a small shop?” by Gunnar_Zarncke

This is a link post.

Anthropic (post June 27th):

We let Claude [Sonnet 3.7] manage an automated store in our office as a small business for about a month. We learned a lot from how close it was to su…

00:01:04 | Mon 30 Jun 2025

“Paradigms for computation” by Cole Wyeth

Epistemic status: Though I can't find it now, I remember reading a lesswrong post asking "what is your totalizing worldview?" I think this post gets at my answer; in fact, I initially intended to ti…

00:20:28 | Mon 30 Jun 2025

Disclaimer: The podcast and artwork embedded on this page are the property of LessWrong ([email protected]). This content is not affiliated with or endorsed by eachpod.com.

EachPod

EachPod

LessWrong (30+ Karma)

[Linkpost] “Research Note: Our scheming precursor evals had limited predictive power for our in-context scheming evals” by Marius Hobbhahn

“Call for suggestions - AI safety course” by boazbarak

[Linkpost] “IABIED: Advertisement design competition” by yams

“Congress Asks Better Questions” by Zvi

“Curing PMS with Hair Loss Pills” by David Lorell

“AI Task Length Horizons in Offensive Cybersecurity” by Sean Peters

“Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild” by Adam Karvonen, Sam Marks

“There are two fundamentally different constraints on schemers” by Buck

“‘What’s my goal?’” by Raemon

“A Simple Explanation of AGI Risk” by TurnTrout

“AI Moratorium Stripped From BBB” by Zvi

“Authors Have a Responsibility to Communicate Clearly” by TurnTrout

“Scientific Discovery in the Age of Artificial Intelligence” by Jessica Rumbelow

“SLT for AI Safety” by Jesse Hoogland

“The best simple argument for Pausing AI?” by Gary Marcus

“SAE on activation differences” by Santiago Aranguri, jacob_drori, Neel Nanda

“What We Learned Trying to Diff Base and Chat Models (And Why It Matters)” by Clément Dumas, Julian Minder, Neel Nanda

“If you want to be vegan but you worry about health effects of no meat, consider being vegan except for mussels/oysters” by KatWoods

[Linkpost] “Project Vend: Can Claude run a small shop?” by Gunnar_Zarncke

“Paradigms for computation” by Cole Wyeth