Willem Röpke
@willemropke.bsky.social
📤 1312
📥 414
📝 59
PhD student | Interested in all things decision-making and learning
pinned post!
Exciting news! My paper on multi-objective reinforcement learning was accepted at AAMAS 2025! We introduce IPRO (Iterated Pareto Referent Optimisation)—a principled approach to solving multi-objective problems. 🔗 Paper:
arxiv.org/abs/2402.07182
💻 Code:
github.com/wilrop/ipro
10 months ago
2
26
7
7 months ago
0
0
0
I think the Qwen team is missing up on a huge opportunity to basically be the default model in all neurips submissions by not releasing Qwen3
8 months ago
1
1
0
Using LLMs to come up with prompts for LLMs to then ask the LLMs to then train the LLMs to then ....
8 months ago
0
2
0
Manifesting Qwen 3
9 months ago
0
0
0
RIP to my investments from the past few years, it was nice seeing the green while it lasted
9 months ago
0
0
0
The people demand Qwen3!
9 months ago
0
0
0
I've been bashing my head against a wall trying to make TRL and their new vllm-serve work and holy moly it's just an infinite pain why must i suffer
9 months ago
0
0
0
Why does reading a book feel so much more satisfying than watching a TV show? Both are ways of consuming content so I don't get the difference
9 months ago
0
0
0
Bought a cherry coke on accident today. Horrible things happening everywhere apparently
9 months ago
0
1
0
This is actually insanely clever, I would've never thought about this. Seems very interesting and important to fix!
add a skeleton here at some point
9 months ago
0
0
0
I don't recall seeing a video in the recent past that depressed me as much as what I just watched unfolding in the Oval Office
10 months ago
0
2
0
Exciting news! My paper on multi-objective reinforcement learning was accepted at AAMAS 2025! We introduce IPRO (Iterated Pareto Referent Optimisation)—a principled approach to solving multi-objective problems. 🔗 Paper:
arxiv.org/abs/2402.07182
💻 Code:
github.com/wilrop/ipro
10 months ago
2
26
7
This is unholy
10 months ago
0
3
0
How can I stop ChatGPT from talking to me with emojis, this is just the worst update I've ever experienced. I've put it in its memory, in my details, and I even repeat it in the chat but it's just replying like 👉🥺👈
10 months ago
0
0
0
Macron is the goat French people don't appreciate true genius
10 months ago
1
1
0
Why did OpenAI update chatGPT to use emojis in its responses? I hate it and even when I explicitly say this it just keeps doing it.
10 months ago
0
0
0
To whomever put my email in some spam list: I fart in your general direction
11 months ago
0
0
0
The fact that in the year 2025 we are still dealing with the stupid "make the paper fit in an arbitrary format for the camera ready submission" minigame is killing me. Either let me group authors or let me put acknowledgements after the main text. This isn't hard.
11 months ago
2
4
0
Does anyone have any good hacks for making the AAMAS template not suck for people with multiple affiliations? I lose a gazillion lines for basically no reason...
11 months ago
1
0
0
I found a very promising open problem in AI Computing a MEDIAN over a list of rows where one of the elements is just an empty array
11 months ago
0
1
0
I think this is the best paper I’ve ever read:
arxiv.org/abs/2404.03715
A strong emphasis on theoretically principled algorithms for RLHF followed by motivated practical implementations. Well-written and a clear overview of the relevant background and related work. 10/10 no comments
loading . . .
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
This paper studies post-training large language models (LLMs) using preference feedback from a powerful oracle to help a model iteratively improve over itself. The typical approach for post-training L...
https://arxiv.org/abs/2404.03715
11 months ago
0
4
0
Deepseek making my day just a little better
11 months ago
0
2
0
I realise I'm woefully unqualified on this topic, but can someone please explain why we still don't have personal carrier drones? This seems like an obvious next step in transportation and given the state of our tech tree shouldn't be that hard?
11 months ago
1
1
0
I think we should do congestion pricing in a lot more places
11 months ago
0
5
0
Claude just declined my attempt at bribing it to do a better job. Not sure whether to be happy or sad
11 months ago
1
1
0
I learned to stop reading documentation and just ask ChatGPT So far seems to work out great
12 months ago
1
0
0
I just cooked a chatgpt recipe from some leftovers in my fridge and I gotta say it was delicious. The future is now
12 months ago
1
1
0
Can someone please convince me that buying a 3D printer while living in a small appartement is a good idea?
12 months ago
4
3
0
I'm having a weird problem with training DQN on minatar (specifically the gymnax version). In space invaders and breakout, my eval metrics are extremely unstable while my train metric is very smooth. See an example of space invaders below (eval left, train right). Any ideas of what went wrong?
12 months ago
1
0
1
I just learned that this is allowed in Python. Who do I talk to to get this banned?
about 1 year ago
2
0
0
I just spent 1h+ trying to solve an annoying issue which came down to downgrading numpy+tensorflow+keras feels great
about 1 year ago
0
3
0
I just made a commit that fixed a typo with the message "fi typo" 🤦♂️
about 1 year ago
0
2
0
Is there a rule of thumb for RL algorithms that use a replay buffer for determining the size of this buffer relative to the total number of timesteps? For example if DQN takes 500k steps, the RB should be of size ... It could also depend on other parameters, just looking for a general rule of thumb.
about 1 year ago
3
1
0
If any of the cool industry labs want to open an RL (or any ML topic tbh) lab in Brussels in the next year or so, I'd greatly appreciate it! I know someone (me) that wants to continue research but is quite keen on sticking around in Belgium...
about 1 year ago
1
5
0
Back from my vacation! Did I miss any cool papers or other work? Also, Berlin is really amazing!
about 1 year ago
0
2
0
Okay, since a lot of RL people have migrated over here I'm going to do a small experiment! Please drop your favorite RLHF or preference-based RL papers here. I want to speedrun a lit review for my next project!
about 1 year ago
4
20
2
Launching a sweep on wandb and seeing 15 runs 1 minute later is true nightmare fuel Every project I start is so much fun until it's time to run experiments...
about 1 year ago
2
0
0
Is there a consensus on the best way to use attention layers in RL? In particular, I want to somehow use it as part of my encoder that will later feed into other components (e.g. the policy, critic, whatever)
about 1 year ago
0
0
0
If anyone wants to put me on a starter pack I'm: - super funny - really handsome - do a bit of RL on the side
about 1 year ago
1
4
0
My favorite bug is the one you just solved but forgot to pull on the cluster where you are actually running your experiments so much fun not at all the worst ever
about 1 year ago
0
1
0
Follow me for amazing content about machine learning and reinforcement learning. (testing to see if I can get more followers on the new place than on twitter)
about 1 year ago
1
6
0
you reached the end!!
feeds!
log in