π§ β½οΈππ Preprint Alert!!
We built the Spot The Ball benchmark to test visual social inference β the ability to infer missing information from othersβ behavior β in Vision Language Models.
Try the task yourself here:
nehabalamurugan.com/spot-the-bal...