Gabriele Goletto
@gabrigole.bsky.social
π€ 534
π₯ 285
π 4
Research Scientist @ Microsoft. π¨βπ»
https://gabrielegoletto.github.io
reposted by
Gabriele Goletto
Dima Damen
4 months ago
Preprint now on ArXiv π’ The N-Body Problem: Parallel Execution from Single-Person Egocentric Video Input: Single-person egocentric video π€ Out: imagine how these tasks can be performed faster by N > 1 people, correctly e.g. N=2 π₯ π
arxiv.org/abs/2512.11393
π
zhifanzhu.github.io/ego-nbody/
1/4
loading . . .
1
7
7
reposted by
Gabriele Goletto
Dima Damen
about 1 year ago
Now on ArXiv our
@cvprconference.bsky.social
#CVPR2025
paper Learning from Streaming Video with Orthogonal Gradients Instead of shuffling clips, can we learn from videos fed sequentially, where you see a clip once, in order? How to deal with the correlation of gradients over training? 1/3
add a skeleton here at some point
1
17
2
reposted by
Gabriele Goletto
about 1 year ago
Image segmentation doesnβt have to be rocket science. π Why build a rocket engine full of bolted-on subsystems when one elegant unit does the job? π‘ Thatβs what we did for segmentation. β Meet the Encoder-only Mask Transformer (EoMT):
tue-mps.github.io/eomt
(CVPR 2025) (1/6)
1
8
5
reposted by
Gabriele Goletto
Gabriele Berton
about 1 year ago
Excited to release the first worldwide aerial image localization method (and demo!) Take an aerial or satellite image from anywhere in the world, and AstroLoc can (probably) find its location, and provide a precise footprint! Links to paper, demo and full-length (5 min) video β¬οΈ
loading . . .
1
9
1
reposted by
Gabriele Goletto
Dima Damen
about 1 year ago
ππ’ HD-EPIC: A Highly-Detailed Egocentric Video Dataset
hd-epic.github.io
arxiv.org/abs/2502.04144
New collected videos 263 annotations/min: recipe, nutrition, actions, sounds, 3D object movement &fixture associations, masks. 26K VQA benchmark to challenge current VLMs 1/N
loading . . .
2
34
10
reposted by
Gabriele Goletto
Dima Damen
over 1 year ago
Now on ArXiv ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
arxiv.org/abs/2412.01987
soczech.github.io/showhowto/
Given one real image &variable sequence of text instructions, ShowHowTo generates a multi-step sequence of images *conditioned on the scene in the REAL image* π§΅
loading . . .
1
18
4
you reached the end!!
feeds!
log in