Sentiment Analysis of Text and Audio Data IJERTV10IS120009

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Published by : International Journal of Engineering Research & Technology (IJERT)

http://www.ijert.org ISSN: 2278-0181


Vol. 10 Issue 12, December-2021

Sentiment Analysis of Text and Audio Data


Dr. Munish Mehta Kanhav Gupta
Department of Computer Applications Department of Computer Applications
National Institute of Technology National Institute of Technology
Kurukshetra, India Kurukshetra, India

Shubhangi Tiwari Anamika


Department of Computer Applications Department of Computer Applications
National Institute of Technology National Institute of Technology
Kurukshetra, India Kurukshetra, India

Abstract – Sentiment analysis has shown a significant amount large to be analysed manually. So here comes the role of
of growth in the past few years. Textual sentiment analysis sentiment analysis.
has been quite common as well as popular amongst Opinion Mining or Sentiment Analysis is quite useful in
researchers. Here we have first studied the approaches monitoring of social sites as it gets a summary of the
proposed by various researchers regarding sentiment analysis
sentiments of the society on every topic. The ability to take
of all modalities of data. And then we tried implementing
some of those methods. In this paper we have not only focused out valuable information from social data is an exercise
on sentiment analysis of text but also talked about audio data. that is being extensively used by organizations all around
Audio sentiment analysis is an area that is still being explored the globe. Some popular sentiment analysis applications
by researchers and hence new techniques are being utilized to include monitoring of social sites, management of
analyze audio data. Here we try utilizing deep learning in customer support and analysing customer response.
order to classify audio into various sentiments. For text, we Automatic sentiment analysis can be performed on any
have tested basic machine learning models and have tried to data source, to categorize survey responses and chats,
draw a comparison between the results. Twitter and Facebook posts, or to scan emails and other
Keywords – Sentiment Analysis; Natural Language Processing; documents. All this is significant information for the
Feature Extraction; Machine Learning; SVM; CNN; Image companies and can make them take decisions accordingly.
Classification. With its growing demand and advancement in sentiment
analysis techniques, the analysis of sentiments is not only
I. INTRODUCTION limited to textual data. Researchers are even exploring new
Sentiment Analysis is the method of detecting the possibilities in analysis of other modalities of data. Today
emotional tenor behind a sequence of words in order to get with the increased utilisation of internet surfing, the
an understanding of the opinions and attitudes expressed enormous info produced is not only in textual form but
within a piece of data. It is predicted that about 80 percent more and more images and videos are being used to convey
of the data in the world is unstructured or unorganized. one's opinions. There has been significant amount of
Huge amount of data is produced every day that includes e- research on analyzing textual data but research related to
mails, social media chats, blog posts, reports, surveys and other modalities of data including image, speech and video
other online documents. But it is very difficult to examine, content has been limited. [1] Multi-modal emotion
recognize, and understand this data and the process is detection is a recent topic in the domain of sentiment
expensive and time-consuming. Sentiment analysis, on the analysis. In multi-modal sentiment analysis we also take
other hand, helps in making sense of all this formless data care of the audio and visual context of the data. This can be
by automatically processing and classifying it into various employed in building virtual assistants, analysis of
emotional categories. Nowadays, with the widespread of YouTube videos and depression monitoring.
network, people have transformed the way in which they
express opinions. Now, it is done mostly through blogs, II. RELATED WORK
online forums, product analysis sites, social sites etc. So, In this section, we have briefly described some of the
billions of users are expressing their views online. These approaches used by researchers in sentiment analysis of
social platforms help people to interact with each other and text, image, audio data.
share their views regarding any product or service. Social A. Classification of Textual Data
media is producing a huge amount of opinion enrich data In [2] , the authors have performed sentiment classification
as posts, comments, reviews, etc. Social media platforms on Twitter Dataset. Firstly, they used naive-bayes as a
also allow businesses and companies to remain in touch baseline algorithm on unprocessed data. Later they
with their customers and keep getting their valuable performed preprocessing of dataset followed by stop words
feedback and suggestions. This enables them to make removal in order to draw a comparison between the results.
better decisions based upon consumer feedback. Then they used various machine learning approaches like
Nowadays, when a person plans to buy something he or she SVM and Maximum Entropy. They experimented with
definitely prefers to read or view reviews online to get unigrams, bi-grams, stop words removal finally achieving
better understanding about the product. However, the the best results when using SVM with unigram or bi-gram.
volume of data generated online on a daily basis is way too

IJERTV10IS120009 www.ijert.org 16
(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 10 Issue 12, December-2021

In the research performed in [3], a rule based classifier decoder. Sentiment Decoder consisted of Bi-LSTM,
called VADER was presented. VADER stands for Valence attention model and softmax classifier. To reduce
Aware Dictionary for Sentiment Reasoning. It is a rule- overfitting, spectrogram augmentation was implied. The
based classifier based on lexicon that can detect both authors were able to achieve In [7], the authors extracted
polarity and intensity of the emotion. It was made to text and acoustic features separately from the input. They
compare with seven well known lexicons at that time and it have used HMM model for classifying acoustic features
outperformed all of them in case of social media data. In and Naive Bayes and SVM for text features. Final output is
[4], the authors have presented a way for combining the produced after combining the results obtained from both
machine learning models as well as background knowledge type of classifications. This study showed the importance
of lexicons for effective sentiment classification in the of both text as well as acoustic features in sentiment
form of pooling multinomial. They have presented a hybrid identification of audio data.
approach by using naive bayes algorithm that combines
lexicon’s knowledge and the training of the classification III. TEXT SENTIMENT ANALYSIS
model. Datasets were extracted from IBM Lotus blog A. Process of text sentiment analysis:
posts, political posts and movie reviews and it was found
that Linear Pooling model performed well as compared to 1) Collection of data: The very first step is to collect
various other approaches. the data on which we want to do sentiment
B. Classification of Audio Data classification.
In [5], the authors have suggested ways to use the acoustic 2) Preprocessing of data: Before performing any
features extracted from audio signal in order to detect the operations on the data, the data needs to be
emotional status of the speaker. The speech signal was fed cleaned and preprocessed. This includes removal
to the Voice Activity Detection System as input that of punctuations, stop words and other unnecessary
recognizes and differentiates audio from speech signals. features of the data that are not required for
The audio was then fed to ASR model and speaker further processing.
discrimination model for identifying the data and speaker- 3) Feature Extraction: The preprocessed data is then
identity. ASR model then labeled the voices with different converted to feature vectors so that they can be
speaker-ids. The voices were then converted to text with fed to different classifiers. The feature vectors are
the help of Automatic Speech Recognition System. Then the kind of numerical representation of text data.
the speaker Ids were further matched with the converted Feature vectors are much easier to work with if we
text. The text output produced from the ASR system are using machine learning models.
respective to different speakers was a significant feature to 4) Traing the model: After the text has been
predict the emotion expressed by different speakers. In [6] converted to feature vectors the dataset is split into
research, end to end ASR models were used to combine training and testing data. The models are trained
acoustic features with text for sentiment classification. with the help of labeled training data using
Here, RNN-T model was utilized to perform end-to-end suitable classification algorithms.
speech recognition, then the result was fed to sentiment

TABLE I. COMPARISON TABLE


Researchers’ names and Year Model used Type of input data Results

Kharde, Vishal, and Prof Sonawane Naive Bayes, SVM and Text It was found that SVM with bi-gram model
[2] Maximum Entropy gave the highest accuracy.

Hutto, Clayton, and Eric Gilbert [3] VADER Text VADER outperformed the current state of
the art in case of social data when compared
to other lexicon based models.

Melville, Prem, Wojciech Gryc Multinomial Naive Bayes Text It was found that Linear Pooling model
and Richard D. Lawrence [4] performed well as compared to various
other approaches

Maghilnan, S., and M. Rajesh ASR models, Naive Audio data Among ASR models, Bing Speech API
Kumar, 2017 [5] Bayes, linear SVM and gave the highest WRR. Among
VADER. classification models, VADER was the
most effective one.

Lu, Zhiyun, et al [6] RNN-T model, Bi-LSTM, Audio Data RNN with attention and specAug proved to
Attention model and be way better when compared to other RNN
SoftMax pooling/attention models.

Murarka, Aishwarya, et al [7] HMM, Naive Bayes and Audio Data It was found that better results can be
SVM. obtained when text and acoustic features
both are combined to produce the final
output.

IJERTV10IS120009 www.ijert.org 17
(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 10 Issue 12, December-2021

5) Evaluating the model: Finally the trained models Normalisation is done so as we can achieve
are tested and evaluated using the test dataset. convergence as fast as possible.
3) Splitting the dataset:The given data set is split into
B. Dataset two parts the training set and the testing set. This
Here we have used IMDB movie review dataset for is done so that we can know if the model is
sentiment analysis of textual data. overfitted to the training set with the help of the
This dataset consists of 50000 movie reviews out of testing set also known as the validation set.
which 2500 are labeled as positive and 2500 as 4) Building the model using Deep Neural Network:
negative. We selected this dataset because there is an The DNN consists of mainly 2 kinds of layers-
equal distribution of both kinds of sentiments. • The hidden layers / Feature Extraction Part
o convolutions
C. Models o pooling
• The classifier part
We have experimented with different machine learning • Further the following layers are added:
algorithms such as SVM, Naive Bayes, Logistic • Convolution layer : A set of learnable filters are
Regression and KNN. used in this layer. A filter can detect specific
1) Support Vector Machines: It is a non probabilistic features or patterns present in the input image.
model that makes use of a representation of inputs • Pooling layer : This layer is used to reduce the no.
as points in a multi-dimensional space. It draws a of parameters and computations. It also controls
hyperplane in the multi-dimensional space for overfitting by reducing network’s spatial size.
classification of input. Each category or class is • Batch normalization : In this the layer inputs are
assigned a separate region in that space. New standardised so that we can speed up the learning
input is classified based on its similarity with process.
existing data points and regions. • Activation Layer : This layer uses an activation
2) Naïve Bayes: It makes use of Bayes Theorem to function that is responsible for deciding the final
predict the possibility that an input belongs to value of a neuron.
particular class of labels. It assumes that the • Dropout Layer : This layer is basically responsible
features are independent and the occurence of one for preventing overfitting of model.
feature does not affect the other. It is a simple
• Flatten Layer : In this layer the data is converted
probabilistic classifier and yet performs well in
to a 1-d array (flattening) before passing it to the
sentiment analysis tasks.
next layer.
3) Logistic Regression: It wrks on the basis of a
• Dense layer : The dense layer is a fully connected
sigmoid function that always gives the value in
layer which means all the neurons are connected
between 0 and 1 irrespective of what the input is.
to each other in the current and the next layer.
This makes it suitable for classification problems.
4) KNN: It compares the input with its K nearest
5) Train the model: To train the model means
neighbours and assigns the label based upon the
making it learn on a repeat. Now a few
majority indicated by the K nearest neighbours.
hyperparameters are defined like the no. of
epochs, batch size, rate of learning, etc. Here we
IV. VISUAL SENTIMENT ANALYSIS
look for the best parameters only by trying them
repeatedly by changing the values of these
A. Process of Visual Sentiment Analysis :
hyperparameters. For recording the performance
of the model during the period the model is
1) Preparing the dataset: Our dataset contains
training, various types of callbacks are used. Some
greyscale face images of 48*48 pixels dimension.
examples of callbacks that we use are:
Each of the image depicts one of the given
a. EarlyStopping() :is used to stop training
emotion class that is Angry, Surprise, Happy,
when a monitored metric has stopped
Disgust, Fear, Neutral, Sad. The dataset file has
improving.
two fields, namely “emotion” and “pixels”, where
b. ModelCheckpoint() : to save keras
“emotion” field contains numbers from 0-6 as per
model or model weights at some
the emotion depicted by the image and the
frequency.
“pixels” field contains a string with space-
6) The New model is evaluated: We check for the
separated pixel value for each image in row major
performance of the model i.e. how well the model
form. Our major job is to train the model so as to
is learning patterns from the training dataset. In
predict the “emotion” field.
this we test that how the accuracy is changing
2) Pre-processing the given data:Before proceeding
upon increasing the number of epochs. For this
forward the given data is pre-processed using the
purpose we use matplotlib to visualise the model
techniques such as resizing, reshaping, converting
with the help of a learning curve.
the image to greyscale and normalisation.
7) The model is tested: In testing, the model is used
for prediction of some images. Here in testing also

IJERTV10IS120009 www.ijert.org 18
(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 10 Issue 12, December-2021

the test images are also pre-processed before they


are passed to the model. VI.CONCLUSION
8) The model is saved for further use:For saving the After going through the work of different authors we have
model both the weights and the architecture are come to know that all the experiments that were carried out
saved into .h5 file and .json file respectively. This yield either a positive or negative sentiment polarity or
is done so that we don’t need to train the model emotions such as happiness, love, anger, fear and sadness
every time we need to predict the emotion of the as an output. We also came to know that we can achieve
images. So now on we can load the model make better results by combining different modalities such as
the predictions. visual and textual as compared to using one at a time. The
results that we have drawn from the study of these works
B. Dataset are summarised in Table 1. For text classification, lexicon
Here we are using the Kaggle_fer2013 dataset which based approaches tend to execute faster as they do not
is freely available on Kaggle. It includes 35587 images require training of the dataset and are effective for small
and we worked on the fer2013.csv file. This file volumes of input data. However, these approaches rely
contains 2 fields namely emotion and pixels. Another completely on lexicons and if a particular word cannot be
dataset that is used is the RAF_dataset which contains found in the dictionary then it cannot classify that word. On
15399 basic images and 3954 compound images. the other hand, in order to work with large volumes of data,
machine learning models can be utilized and can perform
V. RESULTS better. This requires a lot of preprocessing on the data so
A. Text Sentiment Analysis that it can fit into the model. Many researchers have
The results that we obtained with three different proposed hybrid approaches that use both machine
classifiers namely Logistic Regression, SVM and learning models and the background knowledge of
Naïve Bayes in two categories (with and without stop lexicons. These models have also shown good results as
words) are presented in Table 7. they combine the best of both the methods.
As evident from the table stop words removal have made a
slight increase in accuracy of almost all the models.
TABLE 2: RESULTS ACHIEVED THROUGH VARIOUS MODELS Support Vector Machine (SVM) performed the best as
compared to Logistic Regression and Naive Bayes.
Model Accuracy (with Accuracy (without
stop words) stop words) However Naive Bayes shows the highest amount of
increase after stop words removal.
Logistic 89.4 89.49
Regression REFERENCES
SVM 89.8 89.99 [1] Cai, Guoyong, et al. "Multi-level Deep Correlative Networks for
Naïve Bayes 85.6 86.07 Multi-modal Sentiment Analysis." Chinese Journal of
Electronics 29.6 (2020): 1025-1038.
As evident from the table stop words removal have made a [2] Kharde, Vishal, and Prof Sonawane. "Sentiment analysis of
slight increase in accuracy of almost all the models. twitter data: a survey of techniques." arXiv preprint
arXiv:1601.06971 (2016).
Support Vector Machine (SVM) performed the best as [3] Hutto, Clayton, and Eric Gilbert. "Vader: A parsimonious rule-
compared to Logistic Regression and Naive Bayes. based model for sentiment analysis of social media
However Naive Bayes shows the highest amount of text." Proceedings of the International AAAI Conference on Web
increase after stop words removal. and Social Media. Vol. 8. No. 1. 2014.
[4] Melville, Prem, Wojciech Gryc, and Richard D. Lawrence.
"Sentiment analysis of blogs by combining lexical knowledge
B. Image Sentiment Analysis with text classification." Proceedings of the 15th ACM SIGKDD
international conference on Knowledge discovery and data
TABLE 3: ACCURACY OF THE NETWORKS mining 2009.
[5] Maghilnan, S., and M. Rajesh Kumar. "Sentiment analysis on
Network FER2013 speaker specific speech data." 2017 International Conference on
Intelligent Computing and Control (I2C2). IEEE, 2017.
Validation Test [6] Lu, Zhiyun, et al. "Speech sentiment analysis via pre-trained
A 63% 50% features from end-to-end asr models." ICASSP 2020-2020 IEEE
International Conference on Acoustics, Speech and Signal
B 53% 46%
Processing (ICASSP). IEEE, 2020.
C 63% 60% [7] Murarka, Aishwarya, et al. "Sentiment Analysis of
Final 66% 63% Speech." International Journal of Advanced Research in
Computer and Communication Engineering 6.11 (2017): 240-
243.
[8] Mehta, Munish, Kanhav Gupta, and Shubhangi Tiwari. "A
Review on Sentiment Analysis of Text, Image and Audio
Data." 2021 5th International Conference on Computing
Methodologies and Communication (ICCMC). IEEE, 2021.

IJERTV10IS120009 www.ijert.org 19
(This work is licensed under a Creative Commons Attribution 4.0 International License.)

You might also like