Calibrating the Confidence of Large Language Models by Eliciting Fidelity

M Zhang, M Huang, R Shi, L Guo, C Peng… - arXiv preprint arXiv …, 2024 - arxiv.org
M Zhang, M Huang, R Shi, L Guo, C Peng, P Yan, Y Zhou, X Qiu
arXiv preprint arXiv:2404.02655, 2024arxiv.org
Large language models optimized with techniques like RLHF have achieved good
alignment in being helpful and harmless. However, post-alignment, these language models
often exhibit overconfidence, where the expressed confidence does not accurately calibrate
with their correctness rate. In this paper, we decompose the language model confidence into
the\textit {Uncertainty} about the question and the\textit {Fidelity} to the answer generated by
language models. Then, we propose a plug-and-play method to estimate the confidence of …
Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the \textit{Uncertainty} about the question and the \textit{Fidelity} to the answer generated by language models. Then, we propose a plug-and-play method to estimate the confidence of language models. Our method has shown good calibration performance by conducting experiments with 6 RLHF-LMs on four MCQA datasets. Moreover, we propose two novel metrics, IPR and CE, to evaluate the calibration of the model, and we have conducted a detailed discussion on \textit{Truly Well-Calibrated Confidence}. Our method could serve as a strong baseline, and we hope that this work will provide some insights into the model confidence calibration.
arxiv.org
Showing the best result for this search. See all results