As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
The emergence of Large Language Models (LLMs) provides a new solution to text generation tasks that involve high complexity, such as text style transfer (TST) tasks. However, previous studies have not fully explored the TST capabilities of different LLMs, and have faced issues with a lack of uniform standards in the human evaluation stage. This makes the results of human evaluation difficult to reproduce and less credible. To address this, this paper designs a prompt template to guide the cutting-edge LLMs to perform effective text style transfer and carries out an in-depth comparative analysis of various small-scale language models. In the stage of human evaluation, this paper eschews the conventional rating system, opting instead for a comparative human assessment methodology, which we refer to as duel-ranking. This method determines the relative ranking of models through mutual comparison, serving as an alternative to direct scoring. Detailed evaluation instructions are provided herein, to enhance the reproducibility of this method and ensure consistency throughout the evaluation process. This manual evaluation process reveals that GPT-3.5 and GPT-4 exhibit excellent performance in the TST tasks.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.