Audio narrations of LessWrong posts.
Note: This is a research note, and the analysis is less rigorous than our standard for a published paper. We’re sharing these findings because we think they might be valuable for…
In the fall I am planning to teach an AI safety graduate course at Harvard. The format is likely to be roughly similar to my "foundations of deep learning" course.
I am still not sure of the content…
We’re currently in the process of locking in advertisements for the September launch of If Anyone Builds It, Everyone Dies, and we’re interested in your ideas! If you have graphi…
Back in May I did a dramatization of a key and highly painful Senate hearing. Now, we are back for a House committee meeting. It was entitled ‘Authoritarians and Algorithms: Why U.S. AI Must Lead’ a…
Over the last two years or so, my girlfriend identified her cycle as having a unusually strong and very predictable effect on her mood/affect. We tried a bunch of interventions (food, sleep, sociali…
This is a rough research note where the primary objective was my own learning. I am sharing it because I’d love feedback and I thought the results were interesting.
Introduction
A recent METR paper …
Summary: We found that LLMs exhibit significant race and gender bias in realistic hiring scenarios, but their chain-of-thought reasoning shows zero evidence of this bias. This serves as a nice examp…
People (including me) often say that scheming models “have to act as if they were aligned”. This is an alright summary; it's accurate enough to use when talking to a lay audience. But if you want …
The first in a series of bite-sized rationality prompts[1].
This is my most common opening-move for Instrumental Rationality. There are many, many other pieces of instrumental rationality. But ask…
Notes from a talk originally given at my alma mater
I went to Grinnell College for my undergraduate degree. For the 2025 reunion event, I agreed to speak on a panel about AI. I like the talk I gave …
The insane attempted AI moratorium has been stripped from the BBB. That doesn’t mean they won’t try again, but we are good for now. We should use this victory as an opportunity to learn. Here's what…
When a claim is shown to be incorrect, defenders may say that the author was just being “sloppy” and actually meant something else entirely. I argue that this move is not harmless, charitable, or hea…
For many people, including me, the real promise of AI is massively accelerated scientific discovery. Chatbots, vibe coding, video generation: these things are magical, but what I really want is supe…
This sequence draws from a position paper co-written with Simon Pepin Lehalleur, Jesse Hoogland, Matthew Farrugia-Roberts, Susan Wei, Alexander Gietelink Oldenziel, Stan van Wingerden, George Wang, …
Not saying we should pause AI, but consider the following argument:
TLDR: we find that SAEs trained on the difference in activations between a base model and its instruct finetune are a valuable tool for understanding what changed during finetuning.
This work is the…
This post presents some motivation on why we work on model diffing, some of our first results using sparse dictionary methods and our next steps. This work was done as part of the MATS 7 extension. …
1) They're unlikely to be sentient (few neurons, immobile)
2) If they are sentient, the farming practices look likely to be pretty humane
3) They're extremely nutritionally dense
Buying canned smoke…
Anthropic (post June 27th):
We let Claude [Sonnet 3.7] manage an automated store in our office as a small business for about a month. We learned a lot from how close it was to su…
Epistemic status: Though I can't find it now, I remember reading a lesswrong post asking "what is your totalizing worldview?" I think this post gets at my answer; in fact, I initially intended to ti…