Preview
Preview
by
Gibson Nkhata
Mzuzu University
Bachelor of Science in Information and Communication Technology, 2018
W
December 2022
IE
University of Arkansas
EV
Sentiment analysis (SA) or opinion mining is analysis of emotions and opinions from
texts. It is one of the active research areas in Natural Language Processing (NLP). Various
approaches have been deployed in the literature to address the problem. These techniques
devise complex and sophisticated frameworks in order to attain optimal accuracy with their
BERT in a simple but robust approach for movie reviews sentiment analysis to provide
W
classification for every review, followed by computing overall sentiment polarity for all the
STM) layer. We also implemented and evaluated some accuracy improvement techniques
found that including NLPAUG improved accuracy, however SMOTE did not work well.
Lastly, a heuristic algorithm is applied to compute overall polarity of predicted reviews from
the model output vector. We call our model BERT+BiLSTM-SA, where SA stands for Sen-
timent Analysis. Our best-performing approach comprises BERT and BiLSTM on binary,
par with SOTA techniques on both classifications. For example, on binary classification, we
obtain 97.67% accuracy, while the best performing SOTA model, NB-weighted-BON+dv-
cosine, has 97.40% accuracy on the popular IMDb dataset. The baseline, Entailment as
Few-Shot Learners (EFL), is outperformed on this task by 1.30%. On the other hand, for
has 55.5% accuracy, while we obtain 59.48% accuracy. We outperform the baseline on this
W
IE
EV
PR
DEDICATION
This thesis is dedicated to my late mother, Lincy Pyera Nyavizala Mphande. May her soul
continue resting in peace.
W
IE
EV
PR
ACKNOWLEDGEMENTS
Firstly, I am grateful to my advisor Dr. S. Gauch for her valuable help towards the
completion of this work. I am also thankful to my former advisor, Dr J. Zhan, and Dr.
mation Initiative (IIE/ATI) scholarship programme for making it possible for me to study
at the University of Arkansas. I also thank the Data Analytics that are Robust and Trusted
(DART) project for supporting this work through the University of Arkansas CSCE depart-
W
ment.
IE
Last but not least, I thank all who in one way or another contributed in the completion
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Deep Learning on Sentiment Analysis . . . . . . . . . . . . . . . . . . 5
2.2.2 Deep Learning on Movie Reviews Sentiment Analysis . . . . . . . . . 6
2.3 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 BERT and Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . 8
2.3.2 BERT and Movie Reviews Sentiment Analysis . . . . . . . . . . . . . 9
W
3 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IE 11
3.1.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
EV
3.1.3 Fine-tuning BERT with BiLSTM . . . . . . . . . . . . . . . . . . . . 13
3.1.4 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.5 Accuracy Improvement Approaches . . . . . . . . . . . . . . . . . . . 17
3.1.6 Overall polarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
PR
4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.1 IMDb movie reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.2 SST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.3 MR Movie Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.4 Amazon Product Data dataset . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Experimental settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5.1 Evaluation of Goal 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5.2 Evaluation of Goal 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
W
IE
EV
PR
LIST OF FIGURES
W
IE
EV
PR
LIST OF TABLES
Table 4.1: Accuracy (%) Comparisons of Models on Benchmark Datasets for Binary
Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Table 4.2: Accuracy (%) Comparisons for Three and Four Class Classification on IMDd 31
Table 4.3: Accuracy (%) Comporisons of Models on Benchmark Datasets for Five
Class Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Table 4.4: Accuracy (%) of Our Model with Accuracy Improvement Techniques on
SST-5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Table 4.5: Overall Polarity Computation on All the Datasets . . . . . . . . . . . . . 33
W
IE
EV
PR
1 Introduction
Sentiment Analysis aims to determine the polarity of emotions like happiness, sorrow,
grief, hatred, anger, and affection and opinions from text, reviews, and posts, which are avail-
able in many media platforms [1]. Sentiment analysis helps in tracking people’s viewpoints.
For example, it is a powerful marketing tool that enables product managers to understand
W
it comes to social media monitoring, product and brand recognition, customer satisfaction,
customer loyalty, advertising and promotion’s success, and product acceptance. It is among
IE
the most popular and valuable tasks in the field of NLP [2]. Sentiment analysis can be
EV
conducted as polarity classification or binary classification and fine-grained classification or
movie. Whereas providing a numerical or star rating to a movie quantitatively tells us about
the success or failure of a movie, a collection of movie reviews is what gives us a deeper
qualitative insight on different aspects of the movie. A textual movie review tells us about
the strengths and weaknesses of the movie and deeper analysis of a movie review tells if
the movie generally satisfies the reviewer. We work on Movie Reviews Sentiment Analysis
in this study because movie reviews have standard benchmark datasets, where salient and
perform well on many NLP tasks like named entity recognition, question answering and
1
text classification [4]. It has been used in information retrieval in [5] to build an efficient
ranking model for industry use cases. The pre-trained language model was also successfully
utilised in [6] for extractive summarization of text and used for question answering with
satisfactory results in [7]. Yang et al. [8] efficiently applied the model in data augmentation
yielding optimal results. BERT has been primarily used in [9] for sentiment analysis, but
In this paper, we fine-tune BERT for sentiment analysis on movie reviews, comparing
both binary and fine-grained classifications, and achieve, with our best method, accuracy
W
that surpasses state-of-the art (SOTA) models. Our fine-tuning couples BERT with Bidirec-
tional LSTM (BiLSTM) and use the resulting model for binary and fine-grained sentiment
IE
classification tasks. To deal with class imbalance problem for fine-grained classification, we
EV
also implement oversampling and data augmentation techniques.
Fine-tuning is a common technique for transfer learning. The target model copies all
model designs with their parameters from the source model except the output layer and fine-
PR
tunes these parameters based on the target dataset. The main benefit of fine-tuning is no
need of training the entire model from scratch. Hence, we are fine-tuning BERT by adding
BiLSTM and train the model on movie reviews sentiment analysis benchmark datasets.
BERT processes input features bidirectionally [4], so does BiLSTM [10]. The primary idea
behind bidirectional processing is to present each training sequence forwards and backwards
to two separate recurrent nets, both of which are connected to the same output layer [10].
That is, both BERT and BiLSTM do not process inputs in temporal order, their outputs
Following that, we compute an overall polarity on the output vector from BERT+BiLSTM-
Reproduced with permission of copyright owner. Further reproduction prohibited without permission.