LessWrong (30+ Karma)

“$500 + $500 Bounty Problem: An (Approximately) Deterministic Maximal Redund Always Exists” by johnswentworth, David Lorell

Audio note: this article contains 61 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.

A lot of our work involves…

00:06:45 | Tue 06 May 2025

“Five Hinge‑Questions That Decide Whether AGI Is Five Years Away or Twenty” by charlieoneill

For people who care about falsifiable stakes rather than vibes

TL;DR

All timeline arguments ultimately turn on five quantitative pivots. Pick optimistic answers to three of them and your median fore…

00:11:39 | Tue 06 May 2025

“GPT-4o Sycophancy Post Mortem” by Zvi

Last week I covered that GPT-4o was briefly an (even more than usually) absurd sycophant, and how OpenAI responded to that.

Their explanation at that time was paper thin. It didn’t tell us much tha…

00:29:58 | Tue 06 May 2025

[Linkpost] “Tsinghua paper: Does RL Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?” by Thomas Kwa

This is a link post.

arXiv | project page | Authors: Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang

This paper from Tsinghua find that RL on verifiable rewar…

00:05:25 | Mon 05 May 2025

“The Sweet Lesson: AI Safety Should Scale With Compute” by Jesse Hoogland

A corollary of Sutton's Bitter Lesson is that solutions to AI safety should scale with compute. Let me list a few examples of research directions that aim at this kind of solution:

Deliberative Ali…

00:06:29 | Mon 05 May 2025

“Interim Research Report: Mechanisms of Awareness” by Josh Engels, Neel Nanda, Senthooran Rajamanoharan

Summary

Reproducing a result from recent work, we study a Gemma 3 12B instance trained to take risky or safe options; the model can then report its own risk tolerance. We find that:

Applying LoRA to…

00:17:15 | Mon 05 May 2025

“Overview: AI Safety Outreach Grassroots Orgs” by Severin T. Seehrich

We’ve been looking for joinable endeavors in AI safety outreach over the past weeks and would like to share our findings with you. Let us know if we missed any and we’ll add them to the list.

For co…

00:05:17 | Mon 05 May 2025

“Notes on the Long Tasks METR paper, from a HCAST task contributor” by abstractapplic

I contributed one (1) task to HCAST, which was used in METR's Long Tasks paper. This gave me some thoughts I feel moved to share.

Regarding Baselines and Estimates

METR's tasks have two sources for …

00:04:51 | Mon 05 May 2025

“Why I am not a successionist” by Nina Panickssery

Utilitarianism implies that if we build an AI that successfully maximizes utility/value, we should be ok with it replacing us. Sensible people add caveats related to how hard it’ll be to determine t…

00:04:09 | Sun 04 May 2025

“Updates from Comments on ‘AI 2027 is a Bet Against Amdahl’s Law’” by snewman

AI 2027 is a Bet Against Amdahl's Law was my attempt to summarize and analyze "the key load-bearing arguments AI 2027 presents for short timelines". There were a lot of great comments – every time I…

00:24:36 | Sun 04 May 2025

“‘Superhuman’ Isn’t Well Specified” by JustisMills

Strength

In 1997, with Deep Blue's defeat of Kasparov, computers surpassed human beings at chess. Other games have fallen in more recent years: Go, Starcraft, and League of Legends among them. AI is…

00:07:46 | Sun 04 May 2025

“Interpretability Will Not Reliably Find Deceptive AI” by Neel Nanda

(Disclaimer: Post written in a personal capacity. These are personal hot takes and do not in any way represent my employer's views.)

TL;DR: I do not think we will produce high reliability methods t…

00:13:16 | Sun 04 May 2025

“PSA: Before May 21 is a good time to sign up for cryonics” by AlexMennen

Cryonics Institute and Suspended Animation now have an arrangement where Suspended Animation will conduct a field cryopreservation before shipping the body to Cryonics Institute, thus decreasing tis…

00:01:16 | Sun 04 May 2025

“The Ukraine War and the Kill Market” by Martin Sustrik

Politico writes:

The [Ukrainian] program […] rewards soldiers with points if they upload videos proving their drones have hit Russian targets. It will soon be integrated with a new online marketplac…

00:09:00 | Sun 04 May 2025

“Navigating burnout” by gw

Burnout. Burn out? Whatever, it sucks.

Burnout is a pretty confusing thing made harder by our naive reactions being things like “just try harder” or “grit your teeth and push through”, which usuall…

00:15:36 | Sun 04 May 2025

“Obstacles in ARC’s agenda: Low Probability Estimation” by David Matolcsi

EachPod