Oncel Tuzel
@onceltuzel.bsky.social
๐ค 18
๐ฅ 13
๐ 1
AI researcher at Apple
Check out the code, models, and demo iOS/macOS app using MLX for our fast vision-language models, FastVLM:
github.com/apple/ml-fas..
. Paper: "FastVLM: Efficient Vision Encoding for Vision Language Models", Anasosalu et al., CVPR 2025
arxiv.org/abs/2412.13303
#CVPR2025
#Apple
#research
loading . . .
9 months ago
0
2
0
reposted by
Oncel Tuzel
Marco Cuturi
about 1 year ago
Today is a great day for optimal transport ๐! Lots of gratitude ๐ for all folks who contributed to
ott-jax.readthedocs.io
and pushed for the MOSCOT (now @ nature!) paper, from visionaries
@dominik1klein.bsky.social
, G. Palla, Z. Piran to the magician, Michal Klein! โค๏ธ
www.nature.com/articles/s41...
add a skeleton here at some point
0
22
8
reposted by
Oncel Tuzel
Cem Koรง
about 1 year ago
For more, check out our paper on arxiv:
arxiv.org/abs/2412.13303
With the amazing people:
@pavankumarvasu.bsky.social
, Fartash Faghri, Chun-Liang Li, Hadi Pouransari, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, and
@onceltuzel.bsky.social
loading . . .
FastVLM: Efficient Vision Encoding for Vision Language Models
Scaling the input image resolution is essential for enhancing the performance of Vision Language Models (VLMs), particularly in text-rich image understanding tasks. However, popular visual encoders su...
https://arxiv.org/abs/2412.13303
0
1
1
reposted by
Oncel Tuzel
Jiatao Gu
about 1 year ago
๐คImage-to-3D, monocular depth estimation, camera pose estimation, โฆ, can we achieve all of this with just ONE model easily? ๐Our answer is Yes -- Excited to introduce our latest work: World-consistent Video Diffusion (WVD) with Explicit 3D Modeling!
arxiv.org/abs/2412.01821
1
14
6
you reached the end!!
feeds!
log in