Quantifying ai psychology: A psychometrics benchmark for large language models

Y Li, Y Huang, H Wang, X Zhang, J Zou… - arXiv preprint arXiv …, 2024 - arxiv.org
Y Li, Y Huang, H Wang, X Zhang, J Zou, L Sun
arXiv preprint arXiv:2406.17675, 2024arxiv.org
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities,
increasingly adopting roles akin to human-like assistants. The broader integration of LLMs
into society has sparked interest in whether they manifest psychological attributes, and
whether these attributes are stable-inquiries that could deepen the understanding of their
behaviors. Inspired by psychometrics, this paper presents a framework for investigating
psychology in LLMs, including psychological dimension identification, assessment dataset …
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. The broader integration of LLMs into society has sparked interest in whether they manifest psychological attributes, and whether these attributes are stable-inquiries that could deepen the understanding of their behaviors. Inspired by psychometrics, this paper presents a framework for investigating psychology in LLMs, including psychological dimension identification, assessment dataset curation, and assessment with results validation. Following this framework, we introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence. This benchmark includes thirteen datasets featuring diverse scenarios and item types. Our findings indicate that LLMs manifest a broad spectrum of psychological attributes. We also uncover discrepancies between LLMs' self-reported traits and their behaviors in real-world scenarios. This paper demonstrates a thorough psychometric assessment of LLMs, providing insights into reliable evaluation and potential applications in AI and social sciences.
arxiv.org
Showing the best result for this search. See all results