David Alvarez-Melis
@dmelis.bsky.social
📤 92
📥 101
📝 9
Professoring at Harvard || Researching at MSR || Previously: MIT CSAIL, NYU, IBM Research, ITAM
🚨 New preprint! TL;DR: Backtracking is not the "holy grail" for smarter LLMs. It’s praised for helping models “fix mistakes” and improve reasoning—but is it really the best use of test-time compute? 🤔
9 months ago
1
8
2
reposted by
David Alvarez-Melis
Fernando Diaz
11 months ago
🎉Microsoft Research New England is hiring a predoctoral research assistant to work with
@nancybaym.bsky.social
, Tarleton Gillespie, and
@marylgray.bsky.social
on issues related to the dynamics of technology and society. 🎉
socialmediacollective.org/2025/01/22/s...
loading . . .
Seeking a Sociotechnical Systems Research Assistant (aka “Pre-Doc”)
Apply here: (NOTE: Application Portal opens February 3, 2025) Deadline: March 3, 2025. (Late or incomplete applications will not be considered.) NOTE: Unfortunately, applicants must be eligib…
https://socialmediacollective.org/2025/01/22/seeking-a-sociotechnical-systems-research-assistant-aka-pre-doc/
0
25
20
reposted by
David Alvarez-Melis
Naomi Saphra
about 1 year ago
Transformer LMs get pretty far by acting like ngram models, so why do they learn syntax? A new paper by
sunnytqin.bsky.social
, me, and
@dmelis.bsky.social
illuminates grammar learning in a whirlwind tour of generalization, grokking, training dynamics, memorization, and random variation.
#mlsky
#nlp
loading . . .
Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization
Language models (LMs), like other neural networks, often favor shortcut heuristics based on surface-level patterns. Although LMs behave like n-gram models early in training, they must eventually learn...
https://arxiv.org/abs/2412.04619
5
144
36
you reached the end!!
feeds!
log in