1. EachPod

“Unfaithful chain-of-thought as nudged reasoning” by Paul Bogdan, Uzay Macar, Arthur Conmy, Neel Nanda

Author
LessWrong ([email protected])
Published
Wed 23 Jul 2025
Episode Link
https://www.lesswrong.com/posts/vPAFPpRDEg3vjhNFi/unfaithful-chain-of-thought-as-nudged-reasoning

This piece is based on work conducted during MATS 8.0 and is part of a broader aim of interpreting chain-of-thought in reasoning models.

tl;dr

  • Research on chain-of-thought (CoT) unfaithfulness shows how models’ CoTs may omit information that is relevant to their final decision.
  • Here, we sketch hypotheses for why key information may be omitted from CoTs:
    • Training regimes could teach LLMs to omit information from their reasoning.
    • But perhaps more importantly, statements within a CoT generally have functional purposes, and faithfully mentioning some information may carry no benefits, so models don’t do it.
  • We make further claims about what's going on in faithfulness experiments and how hidden information impacts a CoT:
    • Unfaithful CoTs are often not purely post hoc rationalizations and instead can be understood as biased reasoning.
    • Models continually make choices during CoT, repeatedly drawing new propositions supporting or discouraging different answers.
    • Hidden information can nudge [...]

---

Outline:

(00:21) tl;dr

(01:54) Unfaithfulness

(04:16) CoT is functional, and faithfulness lacks benefits

(09:29) Hidden information can nudge CoTs

(09:48) Silent, soft, and steady spectres

(13:48) Nudges are plausible

(14:42) Nudged CoTs are hard to spot

(15:49) Safety and CoT monitoring

(18:11) Final summary

The original text contained 5 footnotes which were omitted from this narration.

---


First published:

July 22nd, 2025



Source:

https://www.lesswrong.com/posts/vPAFPpRDEg3vjhNFi/unfaithful-chain-of-thought-as-nudged-reasoning


---


Narrated by TYPE III AUDIO.


---

Images from the article:


Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Share to: