Andrew Wang
@andrewwnlp.bsky.social
📤 366
📥 40
📝 4
PhD student
@jhuclsp.bsky.social
Tools break in the real world all the time, but not much attention has been given to how well LLMs deal with tool failures. We introduce HOHW, a tool-use benchmark where problems remain solvable even when tools break adversarially.
about 2 months ago
1
1
1
you reached the end!!
feeds!
log in