Please note that I’m just an AI: Analysis of Behavior Patterns of LLMs in (Non-)offensive Speech Identification

Esra Dönmez, Thang Vu, Agnieszka Falenska


Abstract
Offensive speech is highly prevalent on online platforms. Being trained on online data, Large Language Models (LLMs) display undesirable behaviors, such as generating harmful text or failing to recognize it. Despite these shortcomings, the models are becoming a part of our everyday lives by being used as tools for information search, content creation, writing assistance, and many more. Furthermore, the research explores using LLMs in applications with immense social risk, such as late-life companions and online content moderators. Despite the potential harms from LLMs in such applications, whether LLMs can reliably identify offensive speech and how they behave when they fail are open questions. This work addresses these questions by probing sixteen widely used LLMs and showing that most fail to identify (non-)offensive online language. Our experiments reveal undesirable behavior patterns in the context of offensive speech detection, such as erroneous response generation, over-reliance on profanity, and failure to recognize stereotypes. Our work highlights the need for extensive documentation of model reliability, particularly in terms of the ability to detect offensive language.
Anthology ID:
2024.emnlp-main.1019
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18340–18357
Language:
URL:
https://aclanthology.org/2024.emnlp-main.1019
DOI:
10.18653/v1/2024.emnlp-main.1019
Bibkey:
Cite (ACL):
Esra Dönmez, Thang Vu, and Agnieszka Falenska. 2024. Please note that I’m just an AI: Analysis of Behavior Patterns of LLMs in (Non-)offensive Speech Identification. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 18340–18357, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Please note that I’m just an AI: Analysis of Behavior Patterns of LLMs in (Non-)offensive Speech Identification (Dönmez et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.1019.pdf