Audio narrations of LessWrong posts.
Audio note: this article contains 53 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
Ariana Azarbal*, Matthew A…
I keep meaning to write up a more substantive followup to the Hire (or Become) a Thinking Assistant post. But, I think this still basically the biggest productivity effect size I know of, and more p…
While most people focused on Grok, there was another model release that got uniformly high praise: Kimi K2 from Moonshot.ai.
It's definitely a good model, sir, especially for a cheap-to-run open mo…
Yesterday I covered a few rather important Grok incidents.
Today is all about Grok 4's capabilities and features. Is it a good model, sir?
It's not a great model. It's not the smartest or best mod…
Twitter | Paper PDF
Seven years ago, OpenAI five had just been released, and many people in the AI safety community expected AIs to be opaque RL agents. Luckily, we ended up with reasoning models th…
Tsvi's context
Some context:
My personal context is that I care about decreasing existential risk, and I think that the broad distribution of efforts put forward by X-deriskers fairly strong…
One billion people use chatbots on a weekly basis. That's 1 in every 8 people on Earth.
How many people have mental health issues that cause them to develop religious delusions o…
Most of the interview is about technological unemployment and AI-driven inequality, but here is the part where Sanders talks about loss of control risk (which Gizmodo put as titl…
Previously, we've shared a few higher-effort project proposals relating to AI control in particular. In this post, we'll share a whole host of less polished project proposals. All of these projects …
Anna and Ed are co-first authors for this work. We’re presenting these results as a research update for a continuing body of work, which we hope will be interesting and useful for others working on …
Audio note: this article contains 270 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
This post is a companion …
This is a write-up of a brief investigation into shutdown resistance undertaken by the Google DeepMind interpretability team.
TL;DR
Why do models sometimes resist shutdown? Are they ignoring instruc…
Grok 4, which has excellent benchmarks and which xAI claims is ‘the world's smartest artificial intelligence,’ is the big news.
If you set aside the constant need to say ‘No, Grok, No,’ is it a goo…
Summary
In the paper Measuring AI Ability to Complete Long Software Tasks (Kwa & West et al. 2025), METR defined an AI model's 50% time horizon as the length of tasks (measured by how long they take…
This article includes descriptions of content that some users may find distressing.
Testing was conducted on July 10 and 11; safety measures may have changed since then.
I’m a longtime lurker who …
This post is a response to John Wentsworth's recent post on Generalized Hangriness specifically as it applies outrage, an emotion that is especially likely to make false claims. I expect that some r…
LLMs can be deeply confusing. Thanks to a commission, today we go back to basics.
How did we get such a wide array of confusingly named and labeled models and modes in ChatGPT? What are they, and w…
I assume you are familiar with the METR paper: https://arxiv.org/abs/2503.14499
In case you aren't: the authors measured how long it takes a human to complete some task, then let LLMs do those tasks…
Zack Davis emailed me[1] asking me to weigh in on the moderators' request for comment on their proposal to ban Said_Achmiz. I've had a conflict with Said in the past in this thread and apparently th…
Yes, Prime Minister has a sketch demonstrating how asking a series of leading questions can get a person to a particular conclusion:
“Are you worried about the number of young people without jobs?”
…