Audio narrations of LessWrong posts.
The modern internet is replete with feeds such as Twitter, Facebook, Insta, TikTok, Substack, etc. They're bad in ways but also good in ways. I've been exploring the idea that LessWrong could have a…
In my previous post in this series, I estimated that we have 3 researchers for every advocate working on US AI governance, and I argued that this ratio is backwards. When allocating staff, you almos…
"Getting Things in Order: An Introduction to the R Package seriation":
Seriation [or "ordination"), i.e., finding a suitable linear order for a set of objects given data and a …
Or "Why you should prioritize attacking your allies before anyone else"
Homophones are words that look or sound exactly alike, but convey completely different meanings. It's how bear can refer to t…
This is the result of a half-day research sprint on the recently-introduced amendment to institute a 10-year moratorium on state-level AI regulations in the current budget reconciliation bill, with …
Between late 2024 and mid-May 2025, I briefed over 70 cross-party UK parliamentarians. Just over one-third were MPs, a similar share were members of the House of Lords, and just under one-third came…
A few months from now, I turn 55. I've been a transhumanist since my teens in the late 1980s; since I got online in the 1990s, I have participated remotely in the talking shops and virtual salons of…
Four agents woke up with four computers, a view of the world wide web, and a shared chat room full of humans. Like Claude plays Pokemon, you can watch these agents figure out a new and fantastic wor…
Under present norms, if Alice associates with Bob, and Bob is considered objectionable in some way, Alice can be blamed for her association, even if there is no sign she was complicit in Bob's sin.
…The full post is long, but you can 80/20 the value with the 700 word summary! Over half the post is eight optional case studies. Thanks to Jemima Jones, Claude 4 Opus and Gemini 2.5 Pro for help cop…
AIXI is a dualistic agent that can't work as an embedded agent... right? I couldn't find a solid formal proof of this claim, so I investigated it myself (with Marcus Hutter). It …
How good are Claude Opus 4 and Claude Sonnet 4?
They’re good models, sir.
If you don’t care about price or speed, Opus is probably the best model available today.
If you do care somewhat, Sonnet …
I'm making a website on AI companies' model evals for dangerous capabilities: AI Safety Claims Analysis. This is approximately the only analysis of companies' model evals, as far as I know. This sit…
The new scorecard is on my website, AI Lab Watch. This replaces my old scorecard. I redid the content from scratch; it's now up-to-date and higher-quality. I'm also happy with the scorecard's struct…
Epistemic Status: Over years of reading alignment plans and studying agent foundations, this is my first serious attempt to formulate an alignment research program that I (Cole Wyeth) have not been …
A few decades ago, it was pretty common to mush together priming effects and framing effects and see them as two closely connected parts of a single Bigger Truth about the human …
Lessons from shutting down institutions in Eastern Europe.
This is a cross post from: https://250bpm.substack.com/p/meditations-on-doge
Imagine living in the former Soviet republic of Georgia in…
Unlike everyone else, Anthropic actually Does (Some of) the Research. That means they report all the insane behaviors you can potentially get their models to do, what causes those behaviors, how the…
Abstract
Claude 3.7 Sonnet easily detects when it's being evaluated for scheming. Surface‑level edits to evaluation scenarios, such as lengthening the prompts, or making conflict of objectives less …
Ask an epigenetics researcher what they study, and the standard story you'll hear goes something like this...
"Sometimes a little methyl group (i.e. -CH3) gets stuck on the side of a strand of DNA. …