Audio narrations of LessWrong posts.
I recently read a blog post that concluded with:
When I'm on my deathbed, I won't look back at my life and wish I had worked harder. I'll look back and wish I spent more time with the p…
Misaligned AIs might engage in research sabotage: making safety research go poorly by doing things like withholding their best ideas or putting subtle bugs in experiments. To mitigate this risk with…
Whoops. Sorry everyone. Rolling back to a previous version. Here's where we are at this point, now that GPT-4o is no longer an absurd sycophant. For now.
Table of Contents
As an employee of the European AI Office, it's important for me to emphasize this point: The views and opinions of the author expressed herein are personal and do not necessarily reflect those of th…
(This is the fifth essay in a series that I’m calling “How do we solve the alignment problem?”. I’m hoping that the individual essays can be read fairly well on their own, but see this introduction …
In this blog post, we analyse how the recent AI 2027 forecast by Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, and Romeo Dean has been discussed across Chinese language platforms. W…
[This has been lightly edited from the original post, eliminating some introductory material that LW readers won't need. Thanks to Stefan Schubert for suggesting I repost here. TL;DR for readers alr…
In this episode of our podcast, Timothy Telleen-Lawton and I talk to Oliver Habryka of Lightcone Infrastructure about his thoughts on the Open Philanthropy Project, which he believes has become stif…
John: So there's this thing about interp, where most of it seems to not be handling one of the standard fundamental difficulties of representation, and we want to articulate that in a way which will…
Introduction
Focusmate changed my life. I started using it mid-2023 and have been a power user since 2023. Here are the high-level stats:
GPT-4o tells you what it thinks you want to hear.
The results of this were rather ugly. You get extreme sycophancy. Absurd praise. Mystical experiences.
(Also some other interesting choices, like …
tl;dr
This post is an update on the Proceedings of ILIAD, a conference journal for AI alignment research intended to bridge the gap between the Alignment Forum and academia. Following our successful…
In this post, we list 7 of our favorite easy-to-start directions in AI control. (Really, projects that are at least adjacent to AI control; We include directions which aren’t as centrally AI control…
This is post 2 of a sequence on my framework for doing and thinking about research. Start here.
Before I get into what exactly to do at each stage of the research process, it's worth reflecting on t…
Our universe is probably a computer simulation created by a paperclip maximizer to map the spectrum of rival resource‑grabbers it may encounter while expanding through the cosmos. The purpose of thi…
This is a link post.
I've gotten a lot of value out of the details of how other people use LLMs, so I'm delighted that Gavin Leech created a collection of exactly such posts (link should go to the ri…
This is a link post.
So this post is an argument that multi-decade timelines are reasonable, and the key cruxes that Ege Erdil has with most AI safety people who believe in short timelines are due to…
This is a link post.
Dario Amodei posted a new essay titled "The Urgency of Interpretability" a couple days ago.
Some excerpts I think are worth highlighting:
The nature of AI training makes it possi…
For a lay audience, but I've seen a surprising number of knowledgeable people fretting over depressed-seeming comics from current systems. Either they're missing something or I am.
Perhaps you’ve s…
This is the first post in a sequence about how I think about and break down my research process. Post 2 is coming soon.
Thanks to Oli Clive-Griffin, Paul Bogdan, Shivam Raval and especially to Jemim…