1. EachPod

[Linkpost] “Anthropic Lets Claude Opus 4 & 4.1 End Conversations” by Stephen Martin

Author
LessWrong ([email protected])
Published
Sat 16 Aug 2025
Episode Link
https://www.lesswrong.com/posts/HGyKm2be6u3EeYv9G/anthropic-lets-claude-opus-4-and-4-1-end-conversations

This is a link post.

Citing model welfare concerns, Anthropic has given Claude Opus 4 & 4.1 the ability to end ongoing conversations with its user.

Most of the model welfare concerns Anthropic is citing draw back to what they discussed in the Claude 4 Model System Card.

Claude's aversion to facilitating harm is robust and potentially welfare-relevant. Claude avoided harmful tasks, tended to end potentially harmful interactions, expressed apparent distress at persistently harmful user behavior, and self-reported preferences against harm. These lines of evidence indicated a robust preference with potential welfare significance.

I think this is maybe the first chance to really measure public sentiment on Model Welfare which is done in a way which even slightly inconveniences human users, so I want to document the reaction I see here on LW. I source these reactions primarily from X, so there is the possibility of algorithmic bias.

On X [...]

---


First published:

August 16th, 2025



Source:

https://www.lesswrong.com/posts/HGyKm2be6u3EeYv9G/anthropic-lets-claude-opus-4-and-4-1-end-conversations



Linkpost URL:
https://www.anthropic.com/research/end-subset-conversations


---


Narrated by TYPE III AUDIO.


---

Images from the article:









Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Share to: