Marginalium

A note in the margins

March 21, 2025

Marginalium

It’s not quite cognitive dissonance, but AI can rationalise stuff to itself too:

Chain-of-Thought (CoT) reasoning … is not always faithful, i.e. CoT reasoning does not always reflect how models arrive at conclusions … on realistic prompts with no artificial bias … Specifically, we find that models rationalize their implicit biases in answers to binary questions (“implicit post-hoc rationalization”). For example, when separately presented with the questions “Is X bigger than Y?” and “Is Y bigger than X?”, models sometimes produce superficially coherent arguments to justify answering Yes to both questions or No to both questions, despite such responses being logically contradictory. We also investigate restoration errors (Dziri et al., 2023), where models make and then silently correct errors in their reasoning, and unfaithful shortcuts, where models use clearly illogical reasoning to simplify solving problems in Putnam questions (a hard benchmark).

See also the Zvi’s analysis.


filed under:

Join over 2000 of us. Get the newsletter.