“The Sweet Lesson: AI Safety Should Scale With Compute” by Jesse Hoogland

Author: LessWrong ([email protected])
Published: Mon 05 May 2025
Episode Link: https://www.lesswrong.com/posts/6hy7tsB2pkpRHqazG/the-sweet-lesson-ai-safety-should-scale-with-compute

A corollary of Sutton's Bitter Lesson is that solutions to AI safety should scale with compute. Let me list a few examples of research directions that aim at this kind of solution:

Deliberative Alignment: Combine chain-of-thought with Constitutional AI, so that safety improves with inference-time compute (see Guan et al. 2025, Figure 13).
AI Control: Design control protocols that pit a red team against a blue team so that running the game for longer results in more reliable estimates of the probability of successful scheming during deployment (e.g., weight exfiltration).
Debate: Design a debate protocol so that running a longer, deeper debate between AI assistants makes us more confident that we're encouraging truthfulness or other desirable qualities (see Irving et al. 2018, Table 1).
Bengio's Scientist AI: Develop safety guardrails that obtain more reliable estimates of the probability of catastrophic risk with increasing inference time:[1]

[I]n the short [...]

The original text contained 2 footnotes which were omitted from this narration.

---

First published:

May 5th, 2025

Source:

https://www.lesswrong.com/posts/6hy7tsB2pkpRHqazG/the-sweet-lesson-ai-safety-should-scale-with-compute

---

Narrated by TYPE III AUDIO.

Share to:

EachPod

EachPod

“The Sweet Lesson: AI Safety Should Scale With Compute” by Jesse Hoogland