Version 1
: Received: 24 July 2024 / Approved: 25 July 2024 / Online: 25 July 2024 (07:29:49 CEST)
How to cite:
Gifu, D.; Silviu-Vasile, C. AI vs. Human: Decoding Text Authenticity with Transformers. Preprints2024, 2024072014. https://doi.org/10.20944/preprints202407.2014.v1
Gifu, D.; Silviu-Vasile, C. AI vs. Human: Decoding Text Authenticity with Transformers. Preprints 2024, 2024072014. https://doi.org/10.20944/preprints202407.2014.v1
Gifu, D.; Silviu-Vasile, C. AI vs. Human: Decoding Text Authenticity with Transformers. Preprints2024, 2024072014. https://doi.org/10.20944/preprints202407.2014.v1
APA Style
Gifu, D., & Silviu-Vasile, C. (2024). AI vs. Human: Decoding Text Authenticity with Transformers. Preprints. https://doi.org/10.20944/preprints202407.2014.v1
Chicago/Turabian Style
Gifu, D. and Covaci Silviu-Vasile. 2024 "AI vs. Human: Decoding Text Authenticity with Transformers" Preprints. https://doi.org/10.20944/preprints202407.2014.v1
Abstract
In an era where the proliferation of large language models blurs the lines between human and machine-generated content, discerning text authenticity is paramount. This study investigates transformer-based language models—BERT, RoBERTa, and DistilBERT—in distinguishing human-written from machine-generated text. By leveraging a comprehensive corpus, including human-written text from sources such as Wikipedia, WikiHow, various news articles in different languages, and texts generated by OpenAI's GPT-2, we conduct rigorous comparative experiments. Our findings highlight the superior effectiveness of ensemble learning models over single classifiers in this critical task. This research underscores the versatility and efficacy of transformer-based methodologies for a wide range of natural language processing applications, significantly advancing text authenticity detection systems. The results demonstrate competitive performance, with the transformer-based method achieving an F-score score of 0.83 with RoBERTa-large (monolingual) and 0.70 with DistilBERT-base-uncased (multilingual).
Keywords
large language models; natural language processing; content creation; text authenticity
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.