Audio narrations of LessWrong posts.
As I think about "what to do about AI x-risk?", some principles that seem useful to me:
This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research. I think I could have written a better version of this post with more time. However, my main …
A common claim is that concern about [X] ‘distracts’ from concern about [Y]. This is often used as an attack to cause people to discard [X] concerns, on pain of being enemies of [Y] concerns, as att…
Enjoy it while it lasts. The Claude 4 era, or the o4 era, or both, are coming soon. Also, welcome to 2025, we measure eras in weeks or at most months. For now, the central thing going on continues …
What in retrospect seem like serious moral crimes were often widely accepted while they were happening. This means that moral progress can require intellectual progress.[1] Intellectual progress oft…
Something's changed about reward hacking in recent systems. In the past, reward hacks were usually accidents, found by non-general, RL-trained systems. Models would randomly explore different behavio…
We've published an essay series on what we call the intelligence curse. Most content is brand new, and all previous writing has been heavily reworked.
Visit intelligence-curse.ai for the full series…
This is a link post.
In this post, we study whether we can modify an LLM's beliefs and investigate whether doing so could decrease risk from advanced AI systems.
We describe a pipeline for modifying …
Every now and then, some AI luminaries
This is a link post.
I’ve read at least a few hundred blog posts, maybe upwards of a thousand. Agreeing with Gavin Leech, I believe I’ve gained from essays more than any other medium. I’m an intellec…
Converting to a for-profit model would undermine the company's founding mission to ensure AGI "benefits all of humanity," argues new letter
This is the full text of a post from Obsolete, a Substack …