Audio narrations of LessWrong posts.
Citing model welfare concerns, Anthropic has given Claude Opus 4 & 4.1 the ability to end ongoing conversations with its user.
Most of the model welfare concerns Anthropic is cit…
Quick Summary
I recently read a novel called The Inheritors, by William Golding. It was slow, it was painful, and before I was even done it had become one of my favorite books.
For whatever reason, there is a dif…
Nick and Lily are co-first authors on this project. Lewis and Neel jointly supervised this project.
TL;DR
In Douglas Hofstadter's "Gödel, Escher, Bach," he explores how simple elements give rise to complex wholes that seem to possess entirely new properties. An ant colony provides the perfect real-world…
Epistemic status: very rough! Spent a couple of days reading the Gradual Disempowerment paper and thinking about my view on it. Won’t spend longer on this, so am sharing rough notes as is
Summary
Sometimes, a philosophy debate has two basic positions, call them A and B. A matches a lot of people's intuitions, but is hard to make realistic. B is initially unintuitive (sometimes radically so),…
In honor of Nate Silver's analysis of when to leave for the airport, and because it's been an intense week, I thought I’d offer my thoughts on various related questions.
Buying The Ti…
Even if the misalignment risk from current AI agents is small, it may be useful to start internally deploying misalignment classifiers: language models designed to classify transcripts that represen…
I wrote a defense of the amyloid hypothesis as an ACX guest post. Scott called it "one of the best things I've read all year, and the first thing on Alzheimers that makes me actu…
Summary: Perfectly labeled outcomes in training can still boost reward hacking tendencies in generalization. This can hold even when the train/test sets are drawn from the exact same distribution. W…
This will only be exciting to those of us who still read physical paper books. But like. Guys. They did it. They invented the perfect bookmark.
Classic paper bookmarks fall out easily. You have to p…
TL;DR
Knowing how evolution works gives you an enormously powerful tool to understand the living world around you and how it came to be that way. (Though it's notoriously hard to use this tool correctly, …
I've gone snowboarding about 30 times since I started learning a few years ago, but every time I'm on a lift, most of the other riders have been out 90 days just this season[1]. In fact, almost ever…
What do I ultimately make of all the new versions of GPT-5?
The practical offerings and how they interact continues to change by the day. I expect more to come. It will take a while for things to s…
We have recently launched a new website / blog / community hub for AIXI / algorithmic information theory (AIT) researchers: https://uaiasi.com/
Marcus Hutter's vision is to strengthen the research c…
Linkpost for Ozy Brennan's August 2025 Asterisk Magazine article.
There's a lot to like about the Rationalist community, but they do have a certain tendency to spawn — shall we s…
Awakening/satori is the process by which meditation permanently cures[1] a person of suffering. I have noticed that people who have gone through the process of awakening usually have little intrinsi…
TL;DR:
A couple months ago, we (Jo and Noah) wrote the first Wikipedia article on Mechanistic Interpretability. It was oddly missing despite Mech Interp's visibility in alignment circles. We think W…