How should we evaluate AI systems that recommend medical tests like MRIs or biopsies for better diagnosis?
In our JMLR paper, we propose a framework to answer counterfactuals like:
“Would we have diagnosed better—or avoided unnecessary tests—if we’d followed the AI?”
👇
www.jmlr.org/papers/v26/2...