Pingchuan Ma (@pima-hyphen.bsky.social)

🤔 What happens when you poke a scene — and your model has to predict how the world moves in response? We built the Flow Poke Transformer (FPT) to model multi-modal scene dynamics from sparse interactions. It learns to predict the 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 of motion itself 🧵👇