Training Verifiers to Solve Math Word Problems

Cobbe, Karl; Kosaraju, Vineet; Bavarian, Mohammad; Chen, Mark; Jun, Heewoo; Kaiser, Lukasz; Plappert, Matthias; Tworek, Jerry; Hilton, Jacob; Nakano, Reiichiro; Hesse, Christopher; Schulman, John

Computer Science > Machine Learning

arXiv:2110.14168 (cs)

[Submitted on 27 Oct 2021 (v1), last revised 18 Nov 2021 (this version, v2)]

Title:Training Verifiers to Solve Math Word Problems

Authors:Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman

View PDF

Abstract:State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning. To diagnose the failures of current models and support research, we introduce GSM8K, a dataset of 8.5K high quality linguistically diverse grade school math word problems. We find that even the largest transformer models fail to achieve high test performance, despite the conceptual simplicity of this problem distribution. To increase performance, we propose training verifiers to judge the correctness of model completions. At test time, we generate many candidate solutions and select the one ranked highest by the verifier. We demonstrate that verification significantly improves performance on GSM8K, and we provide strong empirical evidence that verification scales more effectively with increased data than a finetuning baseline.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2110.14168 [cs.LG]
	(or arXiv:2110.14168v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2110.14168

Submission history

From: Karl Cobbe [view email]
[v1] Wed, 27 Oct 2021 04:49:45 UTC (3,262 KB)
[v2] Thu, 18 Nov 2021 00:23:45 UTC (3,262 KB)

Computer Science > Machine Learning

Title:Training Verifiers to Solve Math Word Problems

Submission history

Access Paper:

References & Citations

3 blog links

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Training Verifiers to Solve Math Word Problems

Submission history

Access Paper:

References & Citations

3 blog links

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators