Google Scholar

Multiple-Choice Questions are Efficient and Robust LLM Evaluators

Z Zhang, L Xu, Z Jiang, H Hao, R Wang - arXiv preprint arXiv:2405.11966, 2024 - arxiv.org

We present GSM-MC and MATH-MC, two multiple-choice (MC) datasets constructed by
collecting answers and incorrect predictions on GSM8K and MATH from over 50 open-
source models. Through extensive experiments, we show that LLMs' performance on the MC
versions of these two popular benchmarks is strongly correlated with their performance on
the original versions, and is quite robust to distractor choices and option orders, while the
evaluation time is reduced by a factor of up to 30. Following a similar procedure, we also …

Save Cite Cited by 1 Related articles All 3 versions View as HTML

[PDF] researchgate.net

[PDF][PDF] Multiple-Choice Questions are Efficient and Robust LLM Evaluators

ZZZJL Xu, HHR Wang - researchgate.net

We present GSM-MC, a multiple-choice (MC) dataset constructed by collecting answers and
incorrect predictions on GSM8K from 60 opensource models. Through extensive
experiments, we show that LLMs' performance on the MC version of this popular benchmark
is strongly correlated with their performance on the original version and is quite robust to
distractor choices and option orders, while the evaluation time is reduced by a factor of up to
30. Following similar procedures, we introduce MATH-MC, constructed from MATH, and …

Save Cite Related articles View as HTML

Showing the best results for this search. See all results

Cite

Advanced search

Saved to My library

Multiple-Choice Questions are Efficient and Robust LLM Evaluators

[PDF][PDF] Multiple-Choice Questions are Efficient and Robust LLM Evaluators