0% found this document useful (0 votes)
28 views11 pages

Preview

This thesis proposes using BERT fine-tuned with a BiLSTM layer for movie review sentiment analysis. The author aims to achieve better accuracy than state-of-the-art methods on both binary and multi-class sentiment classification tasks. The approach fine-tunes BERT on benchmark datasets, then adds a BiLSTM layer and applies techniques like SMOTE and NLPAUG for data augmentation. Experimental results show the approach achieves comparable or better accuracy than baseline and top-performing models on different sentiment classification benchmarks.

Uploaded by

Saad Tayef
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
28 views11 pages

Preview

This thesis proposes using BERT fine-tuned with a BiLSTM layer for movie review sentiment analysis. The author aims to achieve better accuracy than state-of-the-art methods on both binary and multi-class sentiment classification tasks. The approach fine-tunes BERT on benchmark datasets, then adds a BiLSTM layer and applies techniques like SMOTE and NLPAUG for data augmentation. Experimental results show the approach achieves comparable or better accuracy than baseline and top-performing models on different sentiment classification benchmarks.

Uploaded by

Saad Tayef
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 11

Movie Reviews Sentiment Analysis Using BERT

A thesis submitted in partial fulfillment


of the requirements for the degree of
Master of Science in Computer Science

by

Gibson Nkhata
Mzuzu University
Bachelor of Science in Information and Communication Technology, 2018

W
December 2022
IE
University of Arkansas
EV

This thesis is approved for recommendation to the Graduate Council.


PR

Susan Gauch, Ph.D.


Thesis Director

Justin Zhan, Ph.D.


Committee member

Ukash Nakarmi, Ph.D.


Committee member
Mph
Yanjun Pan, Ph.D.
Committee member
ABSTRACT

Sentiment analysis (SA) or opinion mining is analysis of emotions and opinions from

texts. It is one of the active research areas in Natural Language Processing (NLP). Various

approaches have been deployed in the literature to address the problem. These techniques

devise complex and sophisticated frameworks in order to attain optimal accuracy with their

focus on polarity classification or binary classification. In this paper, we aim to fine-tune

BERT in a simple but robust approach for movie reviews sentiment analysis to provide

better accuracy than state-of-the-art (SOTA) methods. We start by conducting sentiment

W
classification for every review, followed by computing overall sentiment polarity for all the

reviews. Both polarity classification and fine-grained classification or multi-scale sentiment


IE
distribution are implemented and tested on benchmark datasets in our work. To optimally
EV
adapt BERT for sentiment classification, we concatenate it with a Bidirectional LSTM (BiL-

STM) layer. We also implemented and evaluated some accuracy improvement techniques

including Synthetic Minority Over-sampling TEchnique (SMOTE) and NLP Augmenter


PR

(NLPAUG) to improve the model for prediction of multi-scale sentiment distribution. We

found that including NLPAUG improved accuracy, however SMOTE did not work well.

Lastly, a heuristic algorithm is applied to compute overall polarity of predicted reviews from

the model output vector. We call our model BERT+BiLSTM-SA, where SA stands for Sen-

timent Analysis. Our best-performing approach comprises BERT and BiLSTM on binary,

three-class, and four-class sentiment classifications, and SMOTE augmentation, in addition

to BERT and BiLSTM, on five-class sentiment classification. Our approach performs at

par with SOTA techniques on both classifications. For example, on binary classification, we

obtain 97.67% accuracy, while the best performing SOTA model, NB-weighted-BON+dv-
cosine, has 97.40% accuracy on the popular IMDb dataset. The baseline, Entailment as

Few-Shot Learners (EFL), is outperformed on this task by 1.30%. On the other hand, for

five-class classification on SST-5, the best SOTA model, RoBERTa+large+Self-explaining,

has 55.5% accuracy, while we obtain 59.48% accuracy. We outperform the baseline on this

task, BERT-large, by 3.6%.

W
IE
EV
PR
DEDICATION

This thesis is dedicated to my late mother, Lincy Pyera Nyavizala Mphande. May her soul
continue resting in peace.

W
IE
EV
PR
ACKNOWLEDGEMENTS

Firstly, I am grateful to my advisor Dr. S. Gauch for her valuable help towards the

completion of this work. I am also thankful to my former advisor, Dr J. Zhan, and Dr.

usman Anjum for their initial support towards this thesis.

I am so grateful to the Institute of International Edecation/Agricultultural Transfor-

mation Initiative (IIE/ATI) scholarship programme for making it possible for me to study

at the University of Arkansas. I also thank the Data Analytics that are Robust and Trusted

(DART) project for supporting this work through the University of Arkansas CSCE depart-

W
ment.
IE
Last but not least, I thank all who in one way or another contributed in the completion

of this thesis, your efforts are not taken for granted.


EV
PR
TABLE OF CONTENTS

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Deep Learning on Sentiment Analysis . . . . . . . . . . . . . . . . . . 5
2.2.2 Deep Learning on Movie Reviews Sentiment Analysis . . . . . . . . . 6
2.3 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 BERT and Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . 8
2.3.2 BERT and Movie Reviews Sentiment Analysis . . . . . . . . . . . . . 9

W
3 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IE 11
3.1.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
EV
3.1.3 Fine-tuning BERT with BiLSTM . . . . . . . . . . . . . . . . . . . . 13
3.1.4 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.5 Accuracy Improvement Approaches . . . . . . . . . . . . . . . . . . . 17
3.1.6 Overall polarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
PR

3.1.7 Overview of Our Work . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.1 IMDb movie reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.2 SST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.3 MR Movie Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.4 Amazon Product Data dataset . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Experimental settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5.1 Evaluation of Goal 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5.2 Evaluation of Goal 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

A All Publications Submitted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

W
IE
EV
PR
LIST OF FIGURES

Figure 3.1: Simplified diagram of BERT . . . . . . . . . . . . . . . . . . . . . . . . . 13


Figure 3.2: Fine-tuning part of BERT with BiLSTM . . . . . . . . . . . . . . . . . . 15
Figure 3.3: Binary Tree Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Figure 3.4: Overview of our work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

W
IE
EV
PR
LIST OF TABLES

Table 4.1: Accuracy (%) Comparisons of Models on Benchmark Datasets for Binary
Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Table 4.2: Accuracy (%) Comparisons for Three and Four Class Classification on IMDd 31
Table 4.3: Accuracy (%) Comporisons of Models on Benchmark Datasets for Five
Class Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Table 4.4: Accuracy (%) of Our Model with Accuracy Improvement Techniques on
SST-5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Table 4.5: Overall Polarity Computation on All the Datasets . . . . . . . . . . . . . 33

W
IE
EV
PR
1 Introduction

Sentiment Analysis aims to determine the polarity of emotions like happiness, sorrow,

grief, hatred, anger, and affection and opinions from text, reviews, and posts, which are avail-

able in many media platforms [1]. Sentiment analysis helps in tracking people’s viewpoints.

For example, it is a powerful marketing tool that enables product managers to understand

customer emotions in their various marketing campaigns. It is an important factor when

W
it comes to social media monitoring, product and brand recognition, customer satisfaction,

customer loyalty, advertising and promotion’s success, and product acceptance. It is among
IE
the most popular and valuable tasks in the field of NLP [2]. Sentiment analysis can be
EV
conducted as polarity classification or binary classification and fine-grained classification or

multi-scale sentiment distribution.

Movie reviews is an important approach to assess the performance of a particular


PR

movie. Whereas providing a numerical or star rating to a movie quantitatively tells us about

the success or failure of a movie, a collection of movie reviews is what gives us a deeper

qualitative insight on different aspects of the movie. A textual movie review tells us about

the strengths and weaknesses of the movie and deeper analysis of a movie review tells if

the movie generally satisfies the reviewer. We work on Movie Reviews Sentiment Analysis

in this study because movie reviews have standard benchmark datasets, where salient and

qualitative works have been published on, in [3], for example.

BERT is a popular pre-trained language representation model and has proven to

perform well on many NLP tasks like named entity recognition, question answering and

1
text classification [4]. It has been used in information retrieval in [5] to build an efficient

ranking model for industry use cases. The pre-trained language model was also successfully

utilised in [6] for extractive summarization of text and used for question answering with

satisfactory results in [7]. Yang et al. [8] efficiently applied the model in data augmentation

yielding optimal results. BERT has been primarily used in [9] for sentiment analysis, but

the accuracy is not satisfactory.

In this paper, we fine-tune BERT for sentiment analysis on movie reviews, comparing

both binary and fine-grained classifications, and achieve, with our best method, accuracy

W
that surpasses state-of-the art (SOTA) models. Our fine-tuning couples BERT with Bidirec-

tional LSTM (BiLSTM) and use the resulting model for binary and fine-grained sentiment
IE
classification tasks. To deal with class imbalance problem for fine-grained classification, we
EV
also implement oversampling and data augmentation techniques.

Fine-tuning is a common technique for transfer learning. The target model copies all

model designs with their parameters from the source model except the output layer and fine-
PR

tunes these parameters based on the target dataset. The main benefit of fine-tuning is no

need of training the entire model from scratch. Hence, we are fine-tuning BERT by adding

BiLSTM and train the model on movie reviews sentiment analysis benchmark datasets.

BERT processes input features bidirectionally [4], so does BiLSTM [10]. The primary idea

behind bidirectional processing is to present each training sequence forwards and backwards

to two separate recurrent nets, both of which are connected to the same output layer [10].

That is, both BERT and BiLSTM do not process inputs in temporal order, their outputs

tend to be mostly based on both previous and next contexts.

Following that, we compute an overall polarity on the output vector from BERT+BiLSTM-

Reproduced with permission of copyright owner. Further reproduction prohibited without permission.

You might also like