Simon Geisler 7 months ago
Do you think your LLM is robust?⚠️With current adversarial attacks it is hard to find out since they optimize the wrong thing! We fix this with our adaptive, semantic, and distributional objective.
By Günnemann's lab @ TU Munich's lab & Google Research, w/ CAIS support
Here's how we did it. 🧵