Google Scholar

Quantifying ai psychology: A psychometrics benchmark for large language models

Y Li, Y Huang, H Wang, X Zhang, J Zou… - arXiv preprint arXiv …, 2024 - arxiv.org

Y Li, Y Huang, H Wang, X Zhang, J Zou, L Sun

arXiv preprint arXiv:2406.17675, 2024•arxiv.org

Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. The broader integration of LLMs into society has sparked interest in whether they manifest psychological attributes, and whether these attributes are stable-inquiries that could deepen the understanding of their behaviors. Inspired by psychometrics, this paper presents a framework for investigating psychology in LLMs, including psychological dimension identification, assessment dataset curation, and assessment with results validation. Following this framework, we introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence. This benchmark includes thirteen datasets featuring diverse scenarios and item types. Our findings indicate that LLMs manifest a broad spectrum of psychological attributes. We also uncover discrepancies between LLMs' self-reported traits and their behaviors in real-world scenarios. This paper demonstrates a thorough psychometric assessment of LLMs, providing insights into reliable evaluation and potential applications in AI and social sciences.

arxiv.org

Show moreShow less

Save Cite Cited by 4 Related articles All 2 versions View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Quantifying ai psychology: A psychometrics benchmark for large language models