Mostly 📌 this here so I can read it in detail later. BUT my immediate thoughts are we usually struggle to agree what is 100% right when comparing human Vs human on MMM tasks. So human Vs LLM?
If you get 80% correct, will that mean there is at least one error for every patient who has polypharmacy
add a skeleton here at some point
28 days ago