Julien Gaubil 13 days ago
DUSt3R et al. are impressive, but how do they actually work? We investigate this in our project 𝘜𝘯𝘥𝘦𝘳𝘴𝘵𝘢𝘯𝘥𝘪𝘯𝘨 𝘔𝘶𝘭𝘵𝘪-𝘝𝘪𝘦𝘸 𝘛𝘳𝘢𝘯𝘴𝘧𝘰𝘳𝘮𝘦𝘳𝘴!
We share findings on the iterative nature of reconstruction, the roles of cross and self-attention, and the emergence of correspondences across the network [1/8] ⬇️
add a skeleton here at some point