Marginalium
A note in the margins
March 28, 2025
Marginalium
My commentary on something from elsewhere on the web.
Interesting article on how AI thinks, by Anthropic. Some highlights:
Knowing how models like Claude think would allow us to have a better understanding of their abilities, as well as help us ensure that they’re doing what we intend them to. For example:
- Claude can speak dozens of languages. What language, if any, is it using “in its head”?> - Claude writes text one word at a time. Is it only focusing on predicting the next word or does it ever plan ahead?
- Claude can write out its reasoning step-by-step. Does this explanation represent the actual steps it took to get to an answer, or is it sometimes fabricating a plausible argument for a foregone conclusion?
And:
solid evidence that:
- Claude sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal “language of thought.” We show this by translating simple sentences into multiple languages and tracing the overlap in how Claude processes them.
- Claude will plan what it will say many words ahead, and write to get to that destination. We show this in the realm of poetry, where it thinks of possible rhyming words in advance and writes the next line to get there. This is powerful evidence that even though models are trained to output one word at a time, they may think on much longer horizons to do so.
- Claude, on occasion, will give a plausible-sounding argument designed to agree with the user rather than to follow logical steps. We show this by asking it for help on a hard math problem while giving it an incorrect hint. We are able to “catch it in the act” as it makes up its fake reasoning, providing a proof of concept that our tools can be useful for flagging concerning mechanisms in models.
And:
In a study of hallucinations, we found the counter-intuitive result that Claude’s default behavior is to decline to speculate when asked a question, and it only answers questions when something inhibits this default reluctance.
Lots interesting. But I still think I disagree with calling this ‘thinking’. More here and here, but I still see no reason to believe this isn’t more like walking for an AI. On that account, these ‘thought processes’ would be more like adjusting to terrain, or something.
filed under: