Dana Arad (@danaarad.bsky.social)

🚨🚨 New preprint 🚨🚨 Ever wonder whether verbalized CoTs correspond to the internal reasoning process of the model? We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from the model parameters to assess CoT faithfulness. arxiv.org/abs/2502.14829

loading . . .

Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps When prompted to think step-by-step, language models (LMs) produce a chain of thought (CoT), a sequence of reasoning steps that the model supposedly used to produce its prediction. However, despite mu... https://arxiv.org/abs/2502.14829