VilΓ©m Zouhar 2 months ago
You have a budget to human-evaluate 100 inputs to your models, but your dataset is 10,000 inputs. Do not just pick 100 randomly!π
We can do better. "How to Select Datapoints for Efficient Human Evaluation of NLG Models?" shows how.π΅οΈ
(random is still a devilishly good baseline)