Study of Twitter Sentiment Analysis Using Machine
Study of Twitter Sentiment Analysis Using Machine
net/publication/317058859
CITATIONS READS
65 8,946
5 authors, including:
Priyanka Badhani
Motilal Nehru National Institute of Technology
1 PUBLICATION 65 CITATIONS
SEE PROFILE
All content following this page was uploaded by Priyanka Badhani on 24 July 2021.
In order to extract sentiment from tweets, sentiment analysis Performing sentiment analysis is challenging on Twitter data,
is used. The results from this can be used in many areas like as we mentioned earlier. Here we define the reasons for this:
analyzing and monitoring changes of sentiment with an event,
Limited tweet size: with just 140 characters in
sentiments regarding a particular brand or release of a
hand, compact statements are generated, which
particular product, analyzing public view of government
results sparse set of features.
policies etc.
Use of slang: these words are different from
A lot of research has been done on Twitter data in order to English words and it can make an approach
classify the tweets and analyze the results. In this paper we outdated because of the evolutionary use of slangs.
aim to review of some researches in this domain and study Twitter features: it allows the use of hashtags, user
how to perform sentiment analysis on Twitter data using reference and URLs. These require different
Python. The scope of this paper is limited to that of the processing than other words.
machine learning models and we show the comparison of User variety: the users express their opinions in a
efficiencies of these models with one another. variety of ways, some using different language in
between, while others using repeated words or
symbols to convey an emotion.
29
International Journal of Computer Applications (0975 – 8887)
Volume 165 – No.9, May 2017
All these problems are required to be faced in the pre- Stemming: Replacing words with their roots, reducing
processing section. different types of words with similar meanings [3].
This helps in reducing the dimensionality of the feature
Apart from these, we face problems in feature extraction with
set.
less features in hand and reducing the dimensionality of
features. Special character and digit removal: Digits and
special characters don’t convey any sentiment.
3. METHODOLOGY Sometimes they are mixed with words, hence their
In order to perform sentiment analysis, we are required to removal can help in associating two words that were
collect data from the desired source (here Twitter). This data otherwise considered different.
undergoes various steps of pre-processing which makes it Creating a dictionary to remove unwanted words
more machine sensible than its previous form. and punctuation marks from the text [5].
Expansion of slangs and abbreviations [5].
Spelling correction [5].
Generating a dictionary for words that are
important [7] or for emoticons [2].
Part of speech (POS) tagging: It assigns tag to each
word in text and classifies a word to a specific
category like noun, verb, adjective etc. POS taggers
are efficient for explicit feature extraction.
30
International Journal of Computer Applications (0975 – 8887)
Volume 165 – No.9, May 2017
3.4 Sentiment classifiers of knowledge and learning takes place at each level
Bayesian logistic regression: selects features and and forwarded to the next level. The hidden layers
provides optimization for performing text are dynamically generated until a desired level of
categorization. It uses a Laplace prior to avoid over- performance is achieved.
fitting and produces sparse predictive models for
text data. The Logistic Regression estimation Case Base Reasoning: In this technique, problems
has the parametric form: that were successfully solved in the past are
accessed and their solutions are retrieved and used
further [10]. It doesn’t require an explicit domain
model, making elicitation a task of gathering case
histories and CBR system can acquire new
Where a normalization function, λ is is a vector knowledge as cases. This makes maintenance of
of weight parameters for feature set and is a large columns of information easier.
binary function that takes as input a feature and a
class label. It is triggered when a certain feature Maximum Entropy Classifier: This classifier
exists and the sentiment is hypothesized in a certain takes no assumptions regarding the relations
way [3]. between features; it always tries to maximize
entropy of a system by computing its conditional
Naïve Bayes: It is a probabilistic classifier with distribution of its class labels [9].
strong conditional independence assumption that is
optimal for classifying classes with highly
dependent features. Adherence to the sentiment
classes is calculated using the Bayes theorem.
’X’ is the feature vector and ’y’ is the class label.
Z(X) is the normalization factor and is the weight
X is a feature vector defined as X = { , …. } coefficient which is the feature function
and is a class label. which is defined as
Zimbra et al [1] propose an approach to use The NLTK library also embodies various trainable classifiers
Dynamic Architecture for Artificial Neural Network (example – Naïve Bayes Classifier).
(DAN2) which is a machine learned model with NLTK library is used for creating a bag-of words model,
sufficient sensitivity to mild expression in tweets. which is a type of unigram model for text. In this model, the
They target to analyze brand related sentiments number of occurrences of each word is counted. The data
where occurrences of mild sentences are frequent. acquired can be used for training classifier models. The
DAN2 is different than the simple neural networks sentiment of the entire tweets is computed by assigning
as the number of hidden layers is not fixed before subjectivity score to each word using a sentiment lexicon.
using the model. As the input is given, accumulation
31
International Journal of Computer Applications (0975 – 8887)
Volume 165 – No.9, May 2017
32
International Journal of Computer Applications (0975 – 8887)
Volume 165 – No.9, May 2017
6. CONCLUSION
Twitter sentiment analysis comes under the category of text
and opinion mining. It focuses on analyzing the sentiments of
the tweets and feeding the data to a machine learning model in
order to train it and then check its accuracy, so that we can use
this model for future use according to the results. It comprises
of steps like data collection, text pre-processing, sentiment
detection, sentiment classification, training and testing the
model. This research topic has evolved during the last decade
with models reaching the efficiency of almost 85%-90%. But
it still lacks the dimension of diversity in the data. Along with
this it has a lot of application issues with the slang used and
the short forms of words. Many analyzers don’t perform well
when the number of classes are increased. Also it’s still not
tested that how accurate the model will be for topics other
than the one in consideration. Hence sentiment analysis has a
very bright scope of development in future.
33
International Journal of Computer Applications (0975 – 8887)
Volume 165 – No.9, May 2017
IJCATM : www.ijcaonline.org 34