Audio narrations of LessWrong posts.
The following work was done independently by me in an afternoon and basically entirely vibe-coded with Claude. Code and instructions to reproduce can be found here.
Emergent Misalignment was discove…
Here's a relatively important question regarding transparency requirements for AI companies: At which points in time should AI companies be required to disclose information? (While I focus on transp…
Anthropic post title: Detecting and countering misuse of AI: August 2025
Read the full report here. Below lines are from the Anthropic post, and have not been edited. Accompanying images are availab…
There are two ways to show that an AI system is safe: show that it doesn't have dangerous capabilities, or show that it's safe even if it has dangerous capabilities. Until three months ago, AI compan…
In the wake of the confusions around GPT-5, this week had yet another round of claims that AI wasn’t progressing, or AI isn’t or won’t create much value, and so on. There were reports that one study…
“This is a Copernican-level shift in perspective for the field of AI safety.” - Gemini 2.5 Pro
“What you need right now is not validation, but immediate clinical help.” - Kimi K2
Two Minute Summary
Let's start with the classic Maxwell's Demon setup.
We have a container of gas, i.e. a bunch of molecules bouncing around. Down the middle of the container is a wall with a tiny door in it, which c…
This post shows the abstract, introduction, and main figures from our new paper "School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs".
TL;DR: We train LLMs on d…
This is a research note presenting a portion of the research Anders Cairns Woodruff completed in the Center on Long-Term Risk's Summer Research Fellowship under the mentorship of Mia Taylor.
The dat…
Summary
When discussing the possibility that LLMs will cease to reason in transparent natural language with other AI safety researchers, we have sometimes noticed that we talk past each other: e.g.,…
Just because you can run, it doesn't mean that you know how to do it properly.
This systematic review showed that:
50% of runners experience an injury each year that prevents them from running for a…
Socioeconomic status, parental education, and parental intelligence have strong effects on child IQ and are themselves correlated with breastfeeding practices. When studies ignor…
This is a linkpost for https://www.arxiv.org/pdf/2508.16245
With Marcus Hutter, Jan Leike (@janleike), and Jessica Taylor (@jessicata) , I have revisited Leike et al.'s paper "A Formal Solution to t…
There's a stereotype that male sexual attraction is triggered mainly by appearance, and female sexual attraction is triggered mainly by status.
…Yes I know, this stereotype is grossly oversimpli…
A studio executive has no beliefs
That's the way of a studio system
We've bowed to every rear of all the studio chiefs
And you can bet your ass we've kissed 'em
Even the birds in the Hollywood hil…
I happily admit I am deeply confused about consciousness.
I don’t feel confident I understand what it is, what causes it, which entities have it, what future entities might have it, to what extent …
Many textbooks, tutorials or ... tapes leave out the ways people actually think about a subject, and leave you to fumble your way to your own picture. They don't even attempt to help you build intui…
Before having kids I thought teaching them to clean up would be similar to the rest of parenting: once they're physically able to do it you start practicing with them, and after a while they're …
…or the it doesn’t make a difference anyway fallacy.
Improving Productivity is Futile
I once had a coaching call on some generic productivity topic along the lines of “I’m not getting done as m…
These are some research notes on whether we could reduce AI takeover risk by cooperating with unaligned AIs. I think the best and most readable public writing on this topic is “Making deals with ear…