Philipp Koehn (@phikoehn.bsky.social)

📊 Preliminary ranking of WMT 2025 General Machine Translation benchmark is here! But don't draw conclusions just yet - automatic metrics are biased for techniques like metric as a reward model or MBR. The official human ranking will be part of General MT findings at WMT. arxiv.org/abs/2508.14909

loading . . .

Preliminary Ranking of WMT25 General Machine Translation Systems We present the preliminary ranking of the WMT25 General Machine Translation Shared Task, in which MT systems have been evaluated using automatic metrics. As this ranking is based on automatic evaluati... https://arxiv.org/abs/2508.14909