“Claude 4 You: The Quest for Mundane Utility” by Zvi

Author: LessWrong ([email protected])
Published: Tue 27 May 2025
Episode Link: https://www.lesswrong.com/posts/cQizFzEvZ8esKJGST/claude-4-you-the-quest-for-mundane-utility

How good are Claude Opus 4 and Claude Sonnet 4?

They’re good models, sir.

If you don’t care about price or speed, Opus is probably the best model available today.

If you do care somewhat, Sonnet 4 is probably best in its class for many purposes, and deserves the 4 label because of its agentic aspects but isn’t a big leap over 3.7 for other purposes. I have been using 90%+ Opus so I can’t speak to this directly. There are some signs of some amount of ‘small model smell’ where Sonnet 4 has focused on common cases at the expense of rarer ones. That's what Opus is for.

That's all as of when I hit post. Things do escalate quickly these days, although I would not include Grok in this loop until proven otherwise, it's a three horse race and if you told me [...]

---

Outline:

(01:17) On Your Marks

(05:32) Standard Silly Benchmarks

(11:09) API Upgrades

(12:45) Coding Time Horizon

(13:47) The Key Missing Feature is Memory

(14:52) Early Reactions

(26:12) Opus 4 Has the Opus Nature

(32:27) Unprompted Attention

(35:09) Max Subscription

(36:24) In Summary