Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation Model

ChaeHun Park, Eugene Jang, Wonsuk Yang, Jong Park


Abstract
Evaluating the quality of responses generated by open-domain conversation systems is a challenging task. This is partly because there can be multiple appropriate responses to a given dialogue history. Reference-based metrics that rely on comparisons to a set of known correct responses often fail to account for this variety, and consequently correlate poorly with human judgment. To address this problem, researchers have investigated the possibility of assessing response quality without using a set of known correct responses. RUBER demonstrated that an automatic response evaluation model could be made using unsupervised learning for the next-utterance prediction (NUP) task. For the unsupervised learning of such model, we propose a method of manipulating a golden response to create a new negative response that is designed to be inappropriate within the context while maintaining high similarity with the original golden response. We find, from our experiments on English datasets, that using the negative samples generated by our method alongside random negative samples can increase the model’s correlation with human evaluations. The process of generating such negative samples is automated and does not rely on human annotation.
Anthology ID:
2021.naacl-main.120
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1525–1534
Language:
URL:
https://aclanthology.org/2021.naacl-main.120
DOI:
10.18653/v1/2021.naacl-main.120
Bibkey:
Cite (ACL):
ChaeHun Park, Eugene Jang, Wonsuk Yang, and Jong Park. 2021. Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation Model. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1525–1534, Online. Association for Computational Linguistics.
Cite (Informal):
Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation Model (Park et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.120.pdf
Video:
 https://aclanthology.org/2021.naacl-main.120.mp4
Code
 nlpcl-lab/dialog-eval-hard-negative
Data
DailyDialogDailyDialog++