Do we really need pixel generation to model motion? π€
We show how directly representing motion in a compact space enables efficient, scalable planning.
10,000Γ faster than video models, enabling planning and reasoning in open-world and robotics settings.
Check it out β¬οΈ
add a skeleton here at some point
9 days ago