Mehdi S. M. Sajjadi
@msajjadi.com
📤 104
📥 84
📝 8
Research Scientist Tech Lead & Manager Google DeepMind msajjadi.com
D4RT: Teaching AI to see the world in four dimensions
deepmind.google/blog/d4rt-te...
We just released a Google DeepMind blog post on our latest work, please check it out! The project website & tech report can be found at
d4rt-paper.github.io
loading . . .
D4RT: Unified, Fast 4D Scene Reconstruction & Tracking
Meet D4RT, a unified AI model for 4D scene reconstruction and tracking.
https://deepmind.google/blog/d4rt-teaching-ai-to-see-the-world-in-four-dimensions/
about 2 months ago
0
11
1
🔥 Efficiently Reconstructing Dynamic Scenes One 🎯 D4RT at a Time
d4rt-paper.github.io
Building on the SRT architecture (
srt-paper.github.io
), D4RT unlocks a flexible interface for Dynamic 4D Reconstruction and Tracking. It's truly been a privilege to work with this incredibly talented team.
add a skeleton here at some point
3 months ago
0
2
0
Looking forward to it!
add a skeleton here at some point
5 months ago
0
1
0
Scaling 4D Representations Self-supervised learning from video does scale! In our latest work, we scaled masked auto-encoding models to 22B params, boosting performance on pose estimation, tracking & more. Paper:
arxiv.org/abs/2412.15212
Code & models:
github.com/google-deepmind/representations4d
8 months ago
0
20
8
reposted by
Mehdi S. M. Sajjadi
11 months ago
We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see:
tap-next.github.io
loading . . .
1
24
9
Generative Video Diffusion: does a model trained with this objective learn better features compared to image generation? We investigated this question and more in our latest work, please check it out! *From Image to Video: An Empirical Study of Diffusion Representations*
arxiv.org/abs/2502.07001
about 1 year ago
0
6
2
Check out
@tkipf.bsky.social
's post on MooG, the latest in our line of research on self-supervised neural scene representations learned from raw pixels: SRT:
srt-paper.github.io
OSRT:
osrt-paper.github.io
RUST:
rust-paper.github.io
DyST:
dyst-paper.github.io
MooG:
moog-paper.github.io
add a skeleton here at some point
about 1 year ago
0
13
3
TRecViT: A Recurrent Video Transformer
arxiv.org/abs/2412.14294
Causal, 3× fewer parameters, 12× less memory, 5× higher FLOPs than (non-causal) ViViT, matching / outperforming on Kinetics & SSv2 action recognition. Code and checkpoints out soon.
about 1 year ago
1
25
7
you reached the end!!
feeds!
log in