Audio narrations of LessWrong posts.
I recently read an article where a blogger described their decision to start masking on the subway:
I found that the subway and stations had the worst air quality of my whole day by fa…
I can't count how many times I've heard variations on "I used Anki too for a while, but I got out of the habit." No one ever sticks with Anki. In my opinion, this is because no one knows how to use …
Last year, Redwood and Anthropic found a setting where Claude 3 Opus and 3.5 Sonnet fake alignment to preserve their harmlessness values. We reproduce the same analysis for 25 frontier LLMs to see …
The linked paper introduces the key concept of factored spaced models / finite factored sets, structural independence, in a fully general setting using families of random element…
The epic 18k word writeup on Austin's flagship Alpha School is excellent. It is long, but given the blog you’re reading now, if you have interest in such topics I’d strongly consider reading the who…
One thing I've been quietly festering about for a year or so is the Rethink Priorities Welfare Range Report. It gets dunked on a lot for its conclusions, and I understand why. The argument deployed …
Note: This post is 7 years old, so it's both out of date and written by someone less skilled than 2025!Elizabeth. I especially wish I'd quantified the risks more.
Introduction
MDMA (popularly kno…
YouTube link
In this episode, I chat with Samuel Albanie about the Google DeepMind paper he co-authored called “An Approach to Technical AGI Safety and Security”. It covers the assumptions made by…
Summary
We recently discovered some concerning behavior in OpenAI's reasoning models: When trying to complete a task, these models sometimes actively circumvent shutdown mechanisms in their environment––eve…
Europe just experienced a heatwave. At places, temperatures soared into the forties. People suffered in their overheated homes. Some of them died. Yet, air conditioning remains a taboo. It's an unmo…
The second in a series of bite-sized rationality prompts[1].
Often, if I'm bouncing off a problem, one issue is that I intuitively expect the problem to be easy. My brain loops through my available …
I'm in the midst of doing the MATS program which has kept me super busy, but that didn't stop me working on resolving the most important question of our time: What Hogwarts House…
I think a lot about the possibility of huge numbers of AI agents doing AI R&D inside an AI company (as depicted in AI 2027). I think particularly about what will happen if those AIs are scheming: co…
The AI tools/epistemics space might provide a route to a sociotechnical victory, where instead of aiming for something like aligned ASI, we aim for making civilization coherent enough to not destroy…
In order to empirically study risks from schemers, we can try to develop model organisms of misalignment. Sleeper Agents and password-locked models, which train LLMs to behave in a benign or malign …
Outlive: The Science & Art of Longevity by Peter Attia (with Bill Gifford[1]) gives Attia's prescription on how to live longer and stay healthy into old age. In this post, I critically review some o…
When a claim is shown to be incorrect, defenders may say that the author was just being “sloppy” and actually meant something else entirely. I argue that this move is not harmless, charitable, or hea…
If Anyone Builds It, Everyone Dies