Nov 2, 2022 · We introduce a suite of methods and corresponding statistical tests one can use to assess metrics in light of the two goals.
In this paper, we introduce a suite of methods to assess whether metrics are dialect robust. These methods show that state-of-the-art metrics are not dialect ...
Nov 2, 2022 · We introduce a suite of methods and corresponding statistical tests one can use to assess metrics in light of the two goals.
TLDR: Text generation metrics that are not robust to dialect variation make it impossible to tell how well systems perform for many groups of users, ...
Evaluation metrics that are not robust to dialect variation make it impossible to tell how well systems perform for many groups of users, ...
May 9, 2024 · The focus of this paper is to evaluate dialect robustness by comparing the performance on USEng and IndEng. All LLMs perform better on USEng as ...
Dialect-robust Evaluation of Generated Text. Jiao Sun | Thibault Sellam | Elizabeth Clark | Tu Vu | Timothy Dozat | Dan Garrette | Aditya Siddhant | Jacob ...
For evaluating machine-generated texts, automatic methods hold the promise of avoiding collection of human judgments, which can be expensive and time-consuming.
Aug 21, 2024 · The paper presents a novel approach to evaluating the dialect robustness of LLMs using a pre-existing dataset of conversational dialogues. By ...
... Dialect-robust Evaluation of Generated Text | Evaluation metrics that are not robust to dialect variation make it impossible to tell how well systems ...