Noah Snavely
@snavely.bsky.social
π€ 1887
π₯ 259
π 163
3D vision fanatic
http://snavely.io
I really like the metric depth prediction problem because it involves fundamental computer vision questions, like How Big is a Thing? and How Far is a Thing?
4 days ago
1
15
3
It's a real honor to receive the Thomas Huang Prize, which memorializes a figure in computer vision who had tremendous impact through his research and mentorship.
add a skeleton here at some point
4 days ago
1
16
0
reposted by
Noah Snavely
Aditya Chetan
11 days ago
Humans can watch tasks like cooking or assembly and reason about what happened, when, and between which parts. Can LVLMs do the same? We built Flat-Pack Bench to test this β and found there is still a long way to go. Accepted at
#CVPR2026
! π₯πͺπ§©(1/n)
loading . . .
1
3
2
I love this sign! I'm guessing it was designed in MacWrite in 1993?
about 2 months ago
0
2
0
Or, one could stop to consider that this is a really bad idea?
add a skeleton here at some point
2 months ago
0
0
0
If you are applying for NSF CAREER, and don't mind watching an ancient 360p video -- I was invited to give a talk on my experience writing one at a CAREER workshop in 2012 and it made its way to YouTube. Someone just reminded me this existed, so I thought I'd share!
www.youtube.com/watch?v=ezKW...
loading . . .
NSF CISE Career Workshop - Noah Snavely (May 18, 2012)
YouTube video by Connection One
https://www.youtube.com/watch?v=ezKWhI_46Yg
3 months ago
0
6
1
reposted by
Noah Snavely
Zhenjun Zhao
3 months ago
VarSplat: Uncertainty-aware 3D Gaussian Splatting for Robust RGB-D SLAM Anh Thuan Tran, Jana Kosecka tl;dr: law of total variance+alpha compositing->per-splat appearance variance->differentiable per-pixel uncertainty map
arxiv.org/abs/2603.09673
0
2
1
reposted by
Noah Snavely
Yoav Artzi
3 months ago
Very excited about
@nthngdy.bsky.social
's new work! It really gets to the bottom (or top, depends where the head in LMs is π) and fundamentals of contemporary LLMs. A real treat of a paper: solid theory, and very cool experiments.
add a skeleton here at some point
1
14
2
Does anyone have any tips for making OpenReview less agonizingly slow?
3 months ago
2
3
0
The time I Tweeted (in jest) that Bluesky is the new Usenet somehow got me clowned on by the ClassicUsenet subreddit.
www.reddit.com/r/ClassicUse...
3 months ago
1
5
1
At kids' parties a lot of dads go in for a "clap shake" type greeting. Was I supposed to learn how to do this at some point?
3 months ago
0
4
0
An en dash seems to be a more reliable AI tell than an em dash.
3 months ago
2
2
0
reposted by
Noah Snavely
Mor Naaman
3 months ago
This might be the biggest job in (open) science. Cornell is creating a new non-profit organization to house
@arxiv.bsky.social
-- and hiring a new person to lead it.
jobs.chronicle.com/job/37961678...
loading . . .
Chief Executive Officer - New York City, New York (US) job with arXiv | 37961678
arXiv seeks its first CEO to champion open, free scientific discovery and guide the platformβs next chapter as an independent nonprofit.
https://jobs.chronicle.com/job/37961678/chief-executive-officer/
1
20
14
Life: A User's Manual (Perec 1978) prefugured the "That's the cup of a carpenter" scene in Indiana Jones and the Last Crusade.
3 months ago
1
3
0
reposted by
Noah Snavely
Zhenjun Zhao
3 months ago
ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training
@haian-jin.bsky.social
, Rundi Wu, Tianyuan Zhang, Ruiqi Gao,
@jonbarron.bsky.social
,
@snavely.bsky.social
,
@holynski.bsky.social
tl;dr: another(?) TTT+VGGT
arxiv.org/abs/2603.04385
1
4
1
Attention-grabbing idea for an academic paper: Somehow work your personal phone number into the title, like those old LifeLock ads featuring the CEO's social security number. I bet this kind of stunt would garner attention, but I haven't figured how to work it naturally into a CVPR paper.
3 months ago
0
1
0
I thought Nirvanna the Band the Show the Movie was a fantastic comedy film! I went in knowing nothing about the premise and had a great time. I have no idea how they filmed parts of it, especially on a budget of just $2M. A nice time in the theater.
3 months ago
0
8
0
If you are attending
@wacvconference.bsky.social
in Tucson next month and stopping in Phoenix, I highly recommend checking out Organ Stop Pizza -- the most fun restaurant I've ever been to. Home of the "world's largest Wurlitzer theater organ"
en.wikipedia.org/wiki/Organ_S...
4 months ago
2
13
1
I found this old bassoon sheet music in storage, and tried to have Gemini scan it to PDF. I think it is getting better with music? But still goes off the rails. (ChatGPT does even worse.)
add a skeleton here at some point
6 months ago
1
4
0
The CVPR process seems driven by the goal of extracting 3 reviews for each paper, a goal that seems to lead to a lot of angst. Is there a reason why 3 is a magic number? Why not two, or even one (assuming the quality is high)?
6 months ago
2
2
0
If you are looking for a new podcast to get into, I recommend Topics with Michael Ian Black and Michael Showalter. Even though they are comedians, they aren't afraid to dive into some weighty topics!
www.earwolf.com/show_archive...
6 months ago
0
3
0
What are some examples of computer vision papers that have attractive system diagrams?
7 months ago
3
6
0
reposted by
Noah Snavely
Phillip Isola
8 months ago
Over the past year, my lab has been working on fleshing out theory + applications of the Platonic Representation Hypothesis. Today I want to share two new works on this topic: Eliciting higher alignment:
arxiv.org/abs/2510.02425
Unpaired learning of unified reps:
arxiv.org/abs/2510.08492
1/9
1
133
39
reposted by
Noah Snavely
Andreas Geiger
8 months ago
#TTT3R
: 3D Reconstruction as Test-Time Training TTT3R offers a simple state update rule to enhance length generalization for
#CUT3R
β No fine-tuning required! πPage:
rover-xingyu.github.io/TTT3R
We rebuilt @taylorswift13βs "22" live at the 2013 Billboard Music Awards - in 3D!
loading . . .
0
38
8
reposted by
Noah Snavely
Nathaniel Burgdorfer
about 1 year ago
We present a new approach to inference-time scene optimization, which we name Radiant Triangle Soup (RTS)
www.arxiv.org/abs/2505.23642
. Also check out really great concurrent work from Held et al.
@janheld.bsky.social
, Triangle Splatting
arxiv.org/abs/2505.19175
0
7
2
reposted by
Noah Snavely
Shiry Ginosar
11 months ago
π§ How βoldβ is your model? Put it to the test with the KiVA Challenge: a new benchmark for abstract visual reasoning, grounded in real developmental data from children and adults. π Prizes: π₯$1K to the top model π₯π₯$500 π Deadline: 10/7/25 π
kiva-challenge.github.io
@iccv.bsky.social
loading . . .
KiVA Challenge @ ICCV 2025
https://kiva-challenge.github.io/
1
23
12
ChatGPT and Gemini both seem to struggle with sheet music. They both insist that this excerpt is in D major (2 sharps), and resist any attempt to tell them that there 3 sharps in the key signature. I think this is really cool and interesting!
11 months ago
3
13
1
reposted by
Noah Snavely
Shiry Ginosar
about 1 year ago
Think LMMs can reason like a 3-year-old? Think again! Our Kid-Inspired Visual Analogies benchmark reveals where young children still win:
ey242.github.io/kiva.github....
Catch our
#ICLR2025
poster today to see where models still fall short! Thurs. April 24 3-5:30 pm Halls 3 + 2B #312
2
25
7
reposted by
Noah Snavely
Zhenjun Zhao
about 1 year ago
Dynamic Camera Poses and Where to Find Them Chris Rockwell,
@jtung.bsky.social
, Tsung-Yi Lin, Ming-Yu Liu, David F. Fouhey, Chen-Hsuan Lin tl;dr: a large-scale dataset of dynamic Internet videos annotated with camera poses
arxiv.org/abs/2504.17788
1
6
3
reposted by
Noah Snavely
Rundong Luo
about 1 year ago
1/6 πβ‘οΈ How to transform standard videos into immersive 360Β° panoramas? We've designed a new AI system for video-to-360Β° panorama generation! Our key insight: large-scale data is crucial for robust panoramic synthesis across diverse scenes.
loading . . .
5
4
1
reposted by
Noah Snavely
about 1 year ago
We have released the Stereo4D dataset! Explore the real-world dynamic 3D tracks:
github.com/Stereo4d/ste...
add a skeleton here at some point
0
13
3
This is really nice work on visual discovery from
@boyangdeng.bsky.social
!
add a skeleton here at some point
about 1 year ago
0
8
0
reposted by
Noah Snavely
about 1 year ago
We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see:
tap-next.github.io
loading . . .
1
25
9
reposted by
Noah Snavely
Jon Barron
about 1 year ago
A thread of thoughts on radiance fields, from my keynote at 3DV: Radiance fields have had 3 distinct generations. First was NeRF: just posenc and a tiny MLP. This was slow to train but worked really well, and it was unusually compressed --- The NeRF was smaller than the images.
2
94
22
reposted by
Noah Snavely
Mor Naaman
about 1 year ago
Fifth Ave jammed
#handsoff
28
4024
555
reposted by
Noah Snavely
Haian Jin
about 1 year ago
π Weβve just released the code and checkpoints for our
#ICLR2025
Oral paper: "LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias". Check it out below π π Code:
github.com/haian-jin/LVSM
π Paper:
arxiv.org/abs/2410.17242
π Project Page:
haian-jin.github.io/projects/LVSM/
loading . . .
0
19
2
This is really cool work!
add a skeleton here at some point
about 1 year ago
1
8
1
reposted by
Noah Snavely
Anand Bhattad
about 1 year ago
[1/10] Is scene understanding solved? Models today can label pixels and detect objects with high accuracy. But does that mean they truly understand scenes? Super excited to share our new paper and a new task in computer vision: Visual Jenga! π
arxiv.org/abs/2503.21770
π
visualjenga.github.io
7
59
15
reposted by
Noah Snavely
Cornell Tech
about 1 year ago
#Backslash
at
#CornellTech
, dedicated to advancing new works of art and technology that escape convention, has announced Mimi α»nα»₯α»ha as its first Backslash Fellow:
tech.cornell.edu/news/mimi-on...
βThis work feels like a marked evolution for me personally,β said α»nα»₯α»ha.
@snavely.bsky.social
0
4
1
reposted by
Noah Snavely
#CVPR2026
about 1 year ago
#CVPR2025
offers registration and travel support to students from underrepresented communities. Awards are based on need, contribution, travel distance, identity, and advisor support. Information and form:
forms.gle/uDR2Q74drC4V...
loading . . .
Broadening Participation Scholarship Form
CVPR 2025 Travel and Registration Support Application CVPR'25 is committed to supporting students from communities that do not traditionally attend CVPR through registration and travel support. Allocation is based on a combination of need, contribution to the conference, where you are traveling from, the community(ies) you identify with and advisor support. Travel support will be issued in fixed amounts that will be based on availability of funds and travel distance. If you would like to be considered for this support, please complete the following application. Decisions will be made on a rolling basis. Applications will be accepted until April 19 2025 (anywhere on earth).
https://forms.gle/uDR2Q74drC4Vbn4F8
0
7
4
reposted by
Noah Snavely
Angjoo Kanazawa
over 1 year ago
Exciting news! MegaSAM code is outπ₯ & the updated Shape of Motion results with MegaSAM are really impressive! A year ago I didn't think we could make any progress on these videos:
shape-of-motion.github.io/results.html
Huge congrats to everyone involved and the community π
loading . . .
3
74
17
I think Qianqian et al's work is really cool! The problem of modeling state within a 3D reasoning system is quite interesting. (And I believe it's pronounced "cuter".)
add a skeleton here at some point
over 1 year ago
0
8
0
reposted by
Noah Snavely
Qianqian Wang
over 1 year ago
Late to post, but excited to introduce CUT3R! An online 3D reasoning framework for many 3D tasks directly from just RGB. For static or dynamic scenes. Video or image collections, all in one! Project Page:
cut3r.github.io
Code and Model:
github.com/CUT3R/CUT3R
loading . . .
2
34
7
This is really unimportant, but I keep seeing the word "advancements" in writing where I would have seen the word "advances" before. I'm taking this to mean that LLMs are at play and therefore, they will influence the language such that the two words will eventually come to mean the same thing!
over 1 year ago
3
7
1
This is a really fun podcast episode for people whose favorite part of the Empire Strikes Back film score is when the Millenium Falcon flies out of the giant space slug!
www.settlingthescorepodcast.com/66-the-empir...
@scoresettlers.bsky.social
loading . . .
#66 β The Empire Strikes Back β Settling the Score
https://www.settlingthescorepodcast.com/66-the-empire-strikes-back/
over 1 year ago
0
5
0
My question for the history heads out there: In the 1960s, did they call the 1880s "the Eighties"?
over 1 year ago
1
2
1
reposted by
Noah Snavely
Vincent Sitzmann
over 1 year ago
Announcing Diffusion Forcing Transformer (DFoT), our new video diffusion algorithm that generates ultra-long videos of 800+ frames. DFoT enables History Guidance, a simple add-on to any existing video diffusion models for a quality boost. Website:
boyuan.space/history-guidance
(1/7)
loading . . .
1
35
6
A shoutout to KWAX-FM, public classical radio broadcasting from Eugene, Oregon. I think it is a real gem! They do Listener's Choice Friday, where you can call in and make a request (they've taken my calls even though I'm in NYC).
kwax.uoregon.edu
loading . . .
KWAX | University of Oregon
https://kwax.uoregon.edu/
over 1 year ago
1
4
0
reposted by
Noah Snavely
Aaron Hertzmann
over 1 year ago
New paper about pictures: I identify trends in geometric perspective in my own drawings and photos, and compare them to how the original scenes looked. I discuss what these trends might say about art history and vision science. Published in _Art & Perception_.
#visionscience
psyarxiv.com/pq8nb
loading . . .
OSF
https://psyarxiv.com/pq8nb
3
19
6
Did Italo Calvino discover bag of words and topic models in 1979?
over 1 year ago
5
39
3
Load more
feeds!
log in