Anastasiia Pedan
@pedanana.bsky.social
📤 87
📥 9
📝 7
my main takeaway from a talk on reward design in rl: ai only beat humans when they were asked not to collaborate 👀👀
9 months ago
1
2
0
Would you be surprised to learn that many empirical implementations of value-aware model learning (VAML) algos, including MuZero, lead to incorrect model & value functions when training stochastic models 🤕? In our new
@icmlconf.bsky.social
2025 paper, we show why this happens and how to fix it 🦾!
11 months ago
1
8
4
you reached the end!!
feeds!
log in