Xiaoyan Bai (@elenal3ai.bsky.social)

📖 ≠ 🧪 The Story is Not the Science. Code is submitted but rarely executed during peer review—an issue likely to worsen with research agents. 🧑‍🔬 We introduce 𝐌𝐞𝐜𝐡𝐄𝐯𝐚𝐥𝐀𝐠𝐞𝐧𝐭, an execution-grounded evaluation of narrative + execution. 𝐕𝐞𝐫𝐢𝐟𝐲 𝐭𝐡𝐞 𝐬𝐜𝐢𝐞𝐧𝐜𝐞, 𝐧𝐨𝐭 𝐣𝐮𝐬𝐭 𝐭𝐡𝐞 𝐬𝐭𝐨𝐫𝐲. 1/n

2 days ago