Visual-spatial intelligence–we rely on it to perceive, interact, and navigate our everyday spaces. To what capacity do MLLMs possess it? Do they mirror how humans think and reason about space?
Presenting “Thinking in Space: How Multimodal Models See, Remember, and Recall Spaces”! [1/n]
about 1 year ago