David Lindner (@davidlindner.bsky.social)

By default, LLM agents with long action sequences use early steps to undermine your evaluation of later steps; a big alignment risk. Our new paper mitigates this, keeps the ability for long-term planning, and doesnt assume you can detect the undermining strategy. 👇

add a skeleton here at some point