Aditi Mavalankar
@aditimavalankar.bsky.social
๐ค 1062
๐ฅ 124
๐ 14
Research Scientist at DeepMind
reposted by
Aditi Mavalankar
Abhinav Moudgil
about 2 months ago
Introducing Celo2: Towards Learned Optimization Free Lunch We show that learned optimizers can generalize to practical tasks like GPT-3 1.3B pretraining and several out-of-distribution vision/RL tasks from limited meta-training (~4.5 GPU hours)! ๐งต
1
2
1
reposted by
Aditi Mavalankar
Tom Schaul
2 months ago
DeepMind's RL team is hiring a research scientist: if you're passionate about RL, come work with us! And if you know people who might be interested, please share:
job-boards.greenhouse.io/deepmind/job...
loading . . .
Research Scientist, Reinforcement Learning
London, UK
https://job-boards.greenhouse.io/deepmind/jobs/7716037
1
28
14
reposted by
Aditi Mavalankar
Tom Schaul
9 months ago
Where do some of Reinforcement Learning's great thinkers stand today? Find out! Keynotes of the RL Conference are online:
www.youtube.com/playlist?lis...
Wanting vs liking, Agent factories, Theoretical limit of LLMs, Pluralist value, RL teachers, Knowledge flywheels (guess who talked about which!)
1
76
25
On my way to
#ICML2025
to present our algorithm that strongly scales with inference compute, in both performance and sample diversity! ๐ Reach out if youโd like to chat more!
add a skeleton here at some point
10 months ago
0
8
2
reposted by
Aditi Mavalankar
Abhinav Moudgil
11 months ago
New side project! assayer: A simple Python-RQ based tool to automatically monitor and evaluate ML model checkpoints offline during training.
1
4
1
reposted by
Aditi Mavalankar
Tom Schaul
12 months ago
Ever thought of joining DeepMind's RL team? We're recruiting for a research engineering role in London:
job-boards.greenhouse.io/deepmind/job...
Please spread the word!
loading . . .
Research Engineer, Reinforcement Learning
London, UK
https://job-boards.greenhouse.io/deepmind/jobs/6688132
1
28
9
Accepted to
#ICML2025
See you in Vancouver!
add a skeleton here at some point
about 1 year ago
0
0
0
reposted by
Aditi Mavalankar
Tom Schaul
about 1 year ago
When faced with a challenge (like debugging) it helps to think back to examples of how you've overcome challenges in the past. Same for LLMs! The method we introduce in this paper is efficient because examples are chosen for their complementarity, leading to much steeper inference-time scaling! ๐งช
add a skeleton here at some point
0
19
5
Excited to share our recent work, AuPair, an inference-time technique that builds on the premise of in-context learning to improve LLM coding performance!
arxiv.org/abs/2502.18487
๐งต
loading . . .
AuPair: Golden Example Pairs for Code Repair
Scaling up inference-time compute has proven to be a valuable strategy in improving the performance of Large Language Models (LLMs) without fine-tuning. An important task that can benefit from additio...
https://arxiv.org/abs/2502.18487
about 1 year ago
1
12
7
reposted by
Aditi Mavalankar
Tom Schaul
over 1 year ago
Are there limits to what you can learn in a closed system? Do we need human feedback in training? Is scale all we need? Should we play language games? What even is "recursive self-improvement"? Thoughts about this and more here:
arxiv.org/abs/2411.16905
loading . . .
Boundless Socratic Learning with Language Games
An agent trained within a closed system can master any desired capability, as long as the following three conditions hold: (a) it receives sufficiently informative and aligned feedback, (b) its covera...
https://arxiv.org/abs/2411.16905
7
110
25
you reached the end!!
feeds!
log in