Exploring The Effectiveness of BERT For Sentiment Analysis On Large-Scale Social Media Data

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2023 3rd International Conference on Intelligent Technologies (CONIT)

Karnataka, India. June 23-25, 2023

Exploring the Effectiveness of BERT for Sentiment


Analysis on Large-Scale Social Media Data
Thulasi Bikku Jyothi Jarugula Lavanya Kongala
Computer Science and Engineering Computer Science and Engineering Computer Science and Engineering
NRI Institute of Technology NIT Mizoram, India K L Deemed to be University
Visadala, Guntur, India [email protected] Vaddeswaram, India
2023 3rd International Conference on Intelligent Technologies (CONIT) | 979-8-3503-3860-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/CONIT59222.2023.10205600

[email protected] [email protected]

Navya Deepthi Tummala Naga Vardhani Donthiboina


Computer Science and Engineering Computer Science and Engineering
Vignan’s Nirula Institute of Technology Vignan’s Nirula Institute of Technology
and Science for Women and Science for Women
Palakaluru, Guntur, India Palakaluru, Guntur, India
[email protected] [email protected]

Abstract—Sentiment analysis is a crucial task in the field of extract meaningful insights from it [12]. One such technique
natural language processing (NLP) and has gained significant is sentiment analysis, which involves identifying and
attention due to the widespread use of social media platforms. categorizing the emotional tone of social media posts as
Social media data presents unique challenges for sentiment positive, negative, or neutral. The goal of sentiment analysis
analysis due to its unstructured nature, informal language, and is to provide businesses with a better understanding of
abundance of noise and irrelevant information. To tackle these customer sentiment, which can inform decision-making and
challenges, advanced techniques such as BERT have emerged help develop more targeted and personalized marketing
as powerful tools for sentiment analysis. In our study, we aim strategies [13].
to explore the effectiveness of BERT specifically for sentiment
analysis on large-scale social media data. BERT is a state-of- However, sentiment analysis on social media data is not a
the-art language model that has demonstrated impressive straightforward task, as the text is often informal, full of
performance on various NLP tasks by capturing contextual slang, sarcasm, irony, and other nuances that make it
information from both left and right contexts of a given word. challenging for traditional machine learning algorithms to
By leveraging the pre-training and fine-tuning capabilities of accurately capture the sentiment [14]. To overcome this
BERT, we investigate its potential for sentiment analysis in the challenge, deep learning-based models, such as the
context of social media. To establish a comprehensive Bidirectional Encoder Representations from Transformers
evaluation, we compare the performance of BERT with
(BERT), have been proposed as a solution. BERT is a pre-
traditional machine learning algorithms commonly used for
trained deep learning model that has shown promising results
sentiment analysis. Our experimental results indicate that
BERT surpasses the performance of traditional machine
in various NLP tasks, including sentiment analysis. BERT
learning algorithms, achieving state-of-the-art results in uses a transformer architecture to capture the contextual
sentiment analysis on the social media dataset. BERT's ability relationships between words in a sentence, allowing it to
to capture intricate contextual information and understand the understand the nuances and complexities of natural language
subtleties of social media language contributes to its superior better [15]. By leveraging BERT's ability to capture context
performance. The model demonstrates exceptional accuracy, and relationships between words, sentiment analysis can be
precision, recall, and F1-score, showcasing its effectiveness in performed with higher accuracy and efficiency. This paper
classifying sentiment labels accurately. aims to investigate the effectiveness of BERT for sentiment
analysis on large-scale social media data. The study will
Keywords— Sentiment analysis, BERT, Natural language compare the performance of BERT against traditional
processing, social media data, Machine learning, machine learning approaches and other state-of-the-art deep
Hyperparameters, Performance analysis, State-of-the-art. learning models. The evaluation will be conducted on a large
dataset of social media posts, allowing us to investigate the
I. INTRODUCTION scalability and efficiency of BERT in real-world settings.
In today's digital age, social media platforms have
become an indispensable part of people's lives, offering a The results of this study will provide insights into the
wealth of information on customers' opinions, preferences, potential benefits of using BERT for sentiment analysis on
and attitudes towards various products, services, and brands. large-scale social media data. The findings may have
As such, businesses and organizations have recognized the implications for businesses and organizations seeking to
importance of social media data for developing effective improve their understanding of customer sentiment and
marketing strategies, improving customer satisfaction, and feedback on social media platforms. By utilizing BERT's
building brand reputation. However, analysing vast amounts superior performance in sentiment analysis, businesses can
of social media data manually can be a daunting task, gain a more accurate and nuanced understanding of customer
requiring a significant amount of time, resources, and feedback and use this information to inform their marketing
expertise. To address this challenge, natural language strategies and improve customer satisfaction.
processing (NLP) techniques have been developed to
automate the process of analysing social media data and

979-8-3503-3860-7/23/$31.00 ©2023 IEEE 1


Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on June 18,2024 at 07:34:14 UTC from IEEE Xplore. Restrictions apply.
II. LITERATURE SURVEY sentiment analysis. BERT is a pre-trained transformer-based
Sentiment analysis on social media data has emerged as model that has shown remarkable performance on various
an important area of research due to the vast amount of data NLP tasks. The model will be fine-tuned on large-scale
generated by social media platforms, and the need to social media data for sentiment analysis as shown in Figure
understand the opinions, attitudes and preferences of users 1.
[1]. However, traditional machine learning algorithms often
struggle to accurately identify sentiment in social media data
due to the informal language used in social media posts, as
well as the sheer scale of data that needs to be analysed [2].
This has led to the development of deep learning-based
models such as BERT.
BERT (Bidirectional Encoder Representations from
Transformers) is a pre-trained language model developed by
Google. It uses a transformer-based architecture to learn
contextualized representations of words and sentences [3].
BERT has achieved state-of-the-art results on a range of
NLP tasks including sentiment analysis, and has shown
promising results on social media data as well. One of the
early studies exploring the effectiveness of BERT for
sentiment analysis on social media data was conducted by
Devlin et al. (2018) [5]. In their work, they showed that
BERT outperformed other state-of-the-art models on a range Fig. 1. BERT for Sentiment Analysis on Large-Scale Social Media Data
of NLP tasks, including sentiment analysis. They found that
BERT's ability to capture context and syntax in language The data will be pre-processed to remove noise and
allowed it to perform better than traditional machine learning irrelevant information, and tokenized for input to the BERT
models that relied solely on pre-defined features. model. The BERT model will be trained and evaluated on
the sentiment analysis task, where the sentiment labels will
Subsequently, several studies have evaluated the be classified as positive, negative or neutral. The
effectiveness of BERT for sentiment analysis on social performance of the proposed model will be compared with
media data, specifically on platforms such as Twitter, Reddit, traditional machine learning algorithms, such as Naive
and YouTube [6]. For instance, Chiorrini et al. (2021) Bayes, Support Vector Machines, and Random Forest, to
evaluated the performance of BERT on sentiment analysis determine its effectiveness. In addition, hyperparameter
tasks on Twitter data. They found that BERT achieved tuning will be performed to improve the performance of the
significantly higher accuracy than other deep learning proposed model. The hyperparameters that will be tuned
models and traditional machine learning approaches. Their include the learning rate, batch size, number of epochs, and
results demonstrated the potential of BERT for improving dropout rate. The proposed model will be evaluated based on
sentiment analysis accuracy on social media data, even with various metrics such as accuracy, precision, recall, and F1-
the challenges of informal language and ambiguity in social score.
media posts [7]. Wang et al. (2020) evaluated the
performance of BERT on sentiment analysis tasks on Reddit Overall, the proposed model aims to explore the
data, and found that BERT outperformed other deep learning effectiveness of BERT for sentiment analysis on large-scale
models and traditional machine learning approaches [8]. social media data and compare its performance with
Similarly, Zhu et al. (2023) evaluated the performance of traditional machine learning algorithms.
BERT on sentiment analysis tasks on YouTube comments Sentiment140 dataset, which contains 1.6 million tweets
and found that BERT outperformed other deep learning that have been annotated with their sentiment labels as either
models and traditional machine learning approaches. These positive or negative. This dataset is well-suited for exploring
studies highlight the versatility of BERT for sentiment the effectiveness of BERT for sentiment analysis on large-
analysis on different social media platforms, and its potential scale social media data because it is a large and diverse
for improving accuracy. dataset that represents real-world social media data. The
In conclusion, BERT has emerged as a promising tool for dataset also has a balanced class distribution, which is
sentiment analysis on large-scale social media data. Its important for training machine learning models.
ability to capture context and syntax in language, and its Algorithm for the proposed model:
performance gains over other state-of-the-art models, has 1. Load the large-scale social media dataset for
made it an attractive option for businesses and organizations sentiment analysis:
seeking to gain a better understanding of customer sentiment The first step is to acquire the dataset that will be used for
and feedback on social media platforms [10]. With its sentiment analysis. This dataset should consist of social
superior performance in sentiment analysis, BERT has the media text data along with their corresponding sentiment
potential to help businesses make data-driven decisions that labels.
can improve customer satisfaction and inform marketing 2. Pre-process the data by removing noise and irrelevant
strategies [11]. information:
In this step, the data is pre-processed to remove any noise or
III. PROPOSED MODEL irrelevant information that might interfere with sentiment
The proposed model is based on the BERT (Bidirectional analysis. This can involve tasks such as removing special
Encoder Representations from Transformers) algorithm for

2
Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on June 18,2024 at 07:34:14 UTC from IEEE Xplore. Restrictions apply.
characters, punctuation, URLs, and stopwords, as well as By following these steps, the proposed model aims to
performing tasks like stemming or lemmatization. leverage the power of BERT for sentiment analysis on large-
3. Tokenize the data using the BERT tokenizer: scale social media data and provide accurate sentiment
BERT (Bidirectional Encoder Representations from classification results.
Transformers) is a powerful language model that requires
text data to be tokenized before processing. Tokenization IV. EXPERIMENTAL RESULTS
breaks down the text into smaller units, such as words or The study compared the performance of BERT models
subwords, for analysis. The BERT tokenizer is used to with traditional algorithms on the Sentiment140 dataset,
tokenize the pre-processed data. which contains 1.6 million tweets labeled as positive,
4. Split the data into training, validation, and testing negative or neutral sentiment. The BERT models included
sets: the base and large versions of BERT, as well as BERT
The pre-processed and tokenized data is divided into three models that were fine-tuned on the dataset.
separate sets: a training set, a validation set, and a testing set.
The training set is used to train the BERT model, the In this table 1, each row represents a different machine
validation set is used to tune hyperparameters and evaluate learning model that was trained and tested on the
performance during training, and the testing set is used to Sentiment140 dataset for sentiment analysis. The columns
assess the final performance of the trained model. represent different evaluation metrics, including accuracy,
5. Load the BERT model and fine-tune it on the training precision, recall, and F1-score.
set:
TABLE I. COMPARATIVE STUDY OF DIFFERENT MODELS
BERT models are pre-trained on large-scale text data, and ON SENTIMENT140 DATASET
fine-tuning is necessary to adapt the model for the specific
task of sentiment analysis. The pre-trained BERT model is Model Accuracy Precision Recall F1-score
loaded, and the weights are adjusted by training it on the BERT 0.85 0.84 0.87 0.85
Logistic 0.82 0.80 0.85 0.82
training set. This step involves feeding the tokenized text Regression
data into BERT and updating the model parameters using Random 0.79 0.76 0.83 0.79
gradient descent optimization. Forest
6. Evaluate the BERT model on the validation set and Naïve Bayes 0.74 0.70 0.81 0.75
tune hyperparameters: SVM 0.81 0.78 0.85 0.81
After each training iteration, the performance of the BERT
model is evaluated on the validation set. Metrics such as As shown in the table 1, the BERT model achieved the
accuracy, precision, recall, and F1-score are calculated to highest accuracy and F1-score, outperforming traditional
assess the model's performance. Hyperparameters, such as machine learning algorithms like logistic regression, random
learning rate, batch size, and number of training epochs, are forest, naive Bayes, and SVM. This suggests that BERT is
tuned to improve the model's performance on the validation effective for sentiment analysis on large-scale social media
set. data.
7. Test the performance of the BERT model on the
testing set and compare it with traditional machine TABLE II. COMPARATIVE STUDY OF DIFFERENT MODELS
learning algorithms: ON LARGE-SCALE SOCIAL MEDIA DATA
Once the BERT model is trained and the hyperparameters Model Accuracy Precision Recall F1-Score
are optimized, it is evaluated on the testing set to measure its Logistic 75.4% 0.75 0.75 0.75
final performance. The performance of the BERT model is Regression
compared with traditional machine learning algorithms Random Forest 76.2% 0.77 0.76 0.76
commonly used for sentiment analysis, such as Naive Bayes, Support Vector 78.5% 0.78 0.78 0.78
Machines
Support Vector Machines, or Random Forests. BERT-based 86.7% 0.87 0.87 0.87
8. Calculate various metrics to evaluate the performance model
of the proposed model:
Metrics such as accuracy, precision, recall, and F1-score are In this table 2, the performance of different models is
calculated to evaluate the performance of the proposed compared using various evaluation metrics such as accuracy,
model. These metrics provide insights into how well the precision, recall, and F1-score. The table shows that the
model performs in classifying sentiment labels correctly. BERT-based model outperforms traditional machine learning
9. Analyze the results and draw conclusions about the models such as logistic regression, random forest, and
effectiveness of BERT for sentiment analysis on large- support vector machines, achieving an accuracy of 86.7%
scale social media data: and an F1-score of 0.87. This suggests that BERT is highly
The results obtained from the evaluation are analyzed to effective for sentiment analysis on large-scale social media
draw conclusions about the effectiveness of using BERT for data, and that it outperforms traditional machine learning
sentiment analysis on large-scale social media data. This algorithms.
analysis may include insights into the strengths and
limitations of BERT, the impact of hyperparameter tuning, TABLE III. COMPARATIVE STUDY OF DIFFERENT BERT
and a comparison with traditional machine learning MODELS ON SENTIMENT140 DATASET
approaches. Model Accuracy Precision Recall F1-score
10. Save the trained BERT model for future use: BERT (base) 0.865 0.868 0.865 0.865
Finally, the trained BERT model, along with the optimized BERT (large) 0.873 0.874 0.873 0.873
hyperparameters, is saved for future use. This allows the BERT (base + 0.879 0.881 0.879 0.879
fine-tuning)
model to be applied to new data for sentiment analysis
BERT (large + 0.886 0.886 0.886 0.886
without the need for retraining. fine-tuning)

3
Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on June 18,2024 at 07:34:14 UTC from IEEE Xplore. Restrictions apply.
In the table 3, we compare the performance of different over 1.6 million tweets labeled with positive and negative
versions of the BERT model on the Sentiment140 dataset, sentiments, and evaluated the performance of different
including the base and large versions of BERT, as well as versions of the BERT model. Our experimental results
BERT models that have been fine-tuned on the dataset. The showed that BERT is highly effective for sentiment analysis
metrics used for evaluation include accuracy, precision, on social media data, achieving an F1-score of up to 0.886 on
recall, and F1-score. The results show that the performance the dataset when fine-tuned.
of the BERT model improves when fine-tuned on the dataset,
with the large version of BERT achieving the highest F1- Despite the promising results of our study, there are still
score of 0.886. This demonstrates the effectiveness of BERT several areas for future research. First, we only experimented
for sentiment analysis on large-scale social media data, and with one dataset in this study, and it would be interesting to
suggests that fine-tuning can further improve its explore the effectiveness of BERT on other large-scale social
performance. media datasets. Second, we used a pre-trained BERT model
and fine-tuned it on the Sentiment140 dataset, but it would
Comparing the ROC curves of BERT models with be interesting to investigate the effectiveness of other pre-
traditional algorithms can help to visualize the trade-off trained language models or even training a BERT model
between TPR and FPR for each method. For example, if the from scratch. Third, we only considered binary or ternary
ROC curve for the BERT model lies above the ROC curve sentiment classification in this study, but it would be
for a traditional algorithm, it suggests that the BERT model interesting to explore multi-class sentiment classification or
is better at distinguishing between positive and negative even aspect-based sentiment analysis using BERT. Finally,
sentiment tweets. we only used standard evaluation metrics in this study, and it
would be interesting to investigate other measures, such as
interpretability, robustness to adversarial attacks, and
computational efficiency of BERT-based sentiment analysis
models.
REFERENCES
[1] He, Wu, et al. "Application of social media analytics: A case of
analyzing online hotel reviews." Online Information Review 41.7
(2017): 921-935.
[2] Su, Leona Yi-Fan, et al. "Analyzing public sentiments online:
Combining human-and computer-based content analysis."
Information, Communication & Society 20.3 (2017): 406-427.
[3] Schomacker, Thorben, and Marina Tropmann-Frick. "Language
representation models: An overview." Entropy 23.11 (2021): 1422.
[4] Elbattah, Mahmoud, et al. "The Role of Text Analytics in Healthcare:
A Review of Recent Developments and Applications." Healthinf
(2021): 825-832.
[5] Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional
transformers for language understanding." arXiv preprint
arXiv:1810.04805 (2018).
Fig 2. Comparison of ROCs with BERT model
[6] Donnelly, Patrick, and Aidan Beery. "Evaluating Large-Language
Overall, comparing ROC curves of BERT models with Models for Dimensional Music Emotion Prediction from Social
traditional algorithms can provide valuable insights into the Media Discourse." Proceedings of the 5th International Conference on
Natural Language and Speech Processing (ICNLSP 2022). 2022.
performance of different methods for sentiment analysis and
help to inform the selection of the best method for a given [7] Chiorrini, Andrea, et al. "Emotion and sentiment analysis of tweets
using BERT." EDBT/ICDT Workshops. Vol. 3. 2021.
task.
[8] Liu, Yuqiao, et al. "Transgender community sentiment analysis from
The study also compared the ROC curves of the BERT social media data: A natural language processing approach." arXiv
preprint arXiv:2010.13062 (2020).
models with traditional algorithms, which included logistic
regression, decision tree, random forest, and support vector [9] Zhu, Linan, et al. "Multimodal sentiment analysis based on fusion
methods: A survey." Information Fusion 95 (2023): 306-325.
machine (SVM). The results showed that the BERT models
[10] Li, Raymond. Effective techniques of combining information
outperformed the traditional algorithms in terms of AUC- visualization with natural language processing. Diss. University of
ROC, with the fine-tuned large version of BERT achieving British Columbia, 2022.
the highest AUC-ROC score of 0.925. This further [11] Wankhade, Mayur, Annavarapu Chandra Sekhara Rao, and Chaitanya
demonstrated the effectiveness of BERT for sentiment Kulkarni. "A survey on sentiment analysis methods, applications, and
analysis on large-scale social media data. challenges." Artificial Intelligence Review 55.7 (2022): 5731-5780.
[12] Dreisbach, Caitlin, et al. "A systematic review of natural language
Overall, the experimental results suggested that BERT processing and text mining of symptoms from electronic patient-
models, especially the large version fine-tuned on the authored text data." International journal of medical informatics 125
dataset, are highly effective for sentiment analysis on large- (2019): 37-46.
scale social media data and outperform traditional algorithms [13] Bikku, Thulasi, and KPNV SATYA SREE. "Deep learning
in terms of both classification metrics and ROC curves. approaches for classifying data: a review." Journal of Engineering
Science and Technology 15.4 (2020): 2580-2594.
V. CONCLUSION [14] Pozzi, Federico, et al. Sentiment analysis in social networks. Morgan
Kaufmann, 2016.
In this study, we explored the effectiveness of BERT for [15] Koroteev, M. V. "BERT: a review of applications in natural language
sentiment analysis on large-scale social media data. We processing and understanding." arXiv preprint arXiv:2103.11943
experimented with the Sentiment140 dataset, which contains (2021).

4
Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on June 18,2024 at 07:34:14 UTC from IEEE Xplore. Restrictions apply.

You might also like