Pingchuan Ma
@pima-hyphen.bsky.social
π€ 37
π₯ 35
π 17
PhD Student at Ommer Lab, Munich (Stable Diffusion) π― Working on getting my first 3M.
Iβm thrilled to share that Iβll present two first-authored papers at
#ICCV2025
πΊ in Honolulu together with
@mgui7.bsky.social
! ποΈ (Thread π§΅π)
13 days ago
1
5
4
reposted by
Pingchuan Ma
Stefan Baumann
16 days ago
π€ What happens when you poke a scene β and your model has to predict how the world moves in response? We built the Flow Poke Transformer (FPT) to model multi-modal scene dynamics from sparse interactions. It learns to predict the π₯πͺπ΄π΅π³πͺπ£πΆπ΅πͺπ°π― of motion itself π§΅π
1
23
9
I just wrapped up my bidding process for AAAI 26, which is always an enjoyable experience. This year, I came across submissions titled things like β000ASDβ, βtestyβ, β123321241β, and even cases with identical titles and abstracts but different submission numbers.
3 months ago
1
1
0
Our work received an invited talk at the Imageomics-AAAI-25 workshop of
#AAAI25
.
@vtaohu.bsky.social
will be representing us there. Without me being there, I still would like to share our poster with you :D We also have another oral presentation for DepthFM on March 1, 2:30 pm-3:45 pm.
8 months ago
0
3
1
π€When combining Vision-language models (VLMs) with Large language models (LLMs), do VLMs benefit from additional genuine semantics or artificial augmentations of the text for downstream tasks? π€¨Interested? Check out our latest work at
#AAAI25
: π»Code and πPaper at:
github.com/CompVis/DisCLIP
π§΅π
10 months ago
1
15
8
reposted by
Pingchuan Ma
Nick Stracke
11 months ago
π€ Why do we extract diffusion features from noisy images? Isnβt that destroying information? Yes, it is - but we found a way to do better. π Hereβs how we unlock better features, no noise, no hassle. π Project Page:
compvis.github.io/cleandift
π» Code:
github.com/CompVis/clea...
π§΅π
2
41
15
you reached the end!!
feeds!
log in