Refining Word Embeddings with Sentiment Information for Sentiment Analysis
DOI:
https://doi.org/10.13052/jicts2245-800X.1031Keywords:
Sentiment embeddings, Sentiment analysis, Word embeddings, Sentiment lexicon, Deep learningAbstract
Natural Language Processing problems generally require the use of pre-trained distributed word representations to be solved with deep learning models. However, distributed representations usually rely on contextual information which prevents them from learning all the important word characteristics. The task of sentiment analysis suffers from such a problem because sentiment information is ignored during the process of learning word embeddings. The performance of sentiment analysis can be affected since two words with similar vectors may have opposite sentiment orientations. The present paper introduces a novel model called Continuous Sentiment Contextualized Vectors (CSCV) to address this problem. The proposed model can learn word sentiment embedding using its surrounding context words. It uses Continuous Bag-of-Words (CBOW) model to deal with the context and sentiment lexicons to identify sentiment. Existing pre-trained vectors are combined then with the obtained sentiment vectors using Principal component analysis (PCA) to enhance their quality. The experiments show that: (1) CSCV vectors can be used to enhance any pre-trained word vectors; (2) The result vectors strongly alleviate the problem of similar words with opposite polarities; (3) The performance of sentiment classification is improved by applying this approach.
Downloads
References
Birjali, M., Kasri, M., and Beni-Hssane, A., 2021, “A Comprehensive Survey on Sentiment Analysis: Approaches, Challenges and Trends,” Knowledge-Based Syst., pp. 1–26.
El-Ansari, A., Beni-Hssane, A., and Saadi, M., 2020, “An Improved Modeling Method for Profile-Based Personalized Search,” Proceedings of the 3rd International Conference on Networking, Information Systems & Security, ACM, New York, NY, USA, pp. 1–6.
El-Ansari, A., Beni-Hssane, A., and Saadi, M., 2017, “A Multiple Ontologies Based System for Answering Natural Language Questions,” pp. 177–186.
Atzeni, M., and Reforgiato Recupero, D., 2020, “Multi-Domain Sentiment Analysis with Mimicked and Polarized Word Embeddings for Human–Robot Interaction,” Futur. Gener. Comput. Syst., 110, pp. 984–999.
Ali, F., Kwak, D., Khan, P., El-Sappagh, S., Ali, A., Ullah, S., Kim, K. H., and Kwak, K.-S., 2019, “Transportation Sentiment Analysis Using Word Embedding and Ontology-Based Topic Modeling,” Knowledge-Based Syst., 174, pp. 27–42.
Dessí, D., Dragoni, M., Fenu, G., Marras, M., and Reforgiato Recupero, D., 2020, “Deep Learning Adaptation with Word Embeddings for Sentiment Analysis on Online Course Reviews,” pp. 57–83.
Kaibi, I., Nfaoui, E. H., and Satori, H., 2020, “Sentiment Analysis Approach Based on Combination of Word Embedding Techniques,” pp. 805–813.
Kasri, M., Birjali, M., and Beni-Hssane, A., 2019, “A Comparison of Features Extraction Methods for Arabic Sentiment Analysis,” Proceedings of the 4th International Conference on Big Data and Internet of Things, ACM, New York, NY, USA, pp. 1–6.
Mikolov, T., Chen, K., Corrado, G., and Dean, J., 2013, “Efficient Estimation of Word Representations in Vector Space,” pp. 1–12.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J., 2013, “Distributed Representations of Words and Phrases and Their Compositionality.”
Pennington, J., Socher, R., and Manning, C. D., 2014, “GloVe: Global Vectors for Word Representation,” Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543.
Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T., 2016, “Enriching Word Vectors with Subword Information,” pp. 1–12.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L., 2018, “Deep Contextualized Word Representations.”
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K., 2018, “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.”
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I., 2019, “Language Models Are Unsupervised Multitask Learners.”
Araque, O., Corcuera-Platas, I., Sánchez-Rada, J. F., and Iglesias, C. A., 2017, “Enhancing Deep Learning Sentiment Analysis with Ensemble Techniques in Social Applications,” Expert Syst. Appl., 77, pp. 236–246.
Giatsoglou, M., Vozalis, M. G., Diamantaras, K., Vakali, A., Sarigiannidis, G., and Chatzisavvas, K. C., 2017, “Sentiment Analysis Leveraging Emotions and Word Embeddings,” Expert Syst. Appl., 69, pp. 214–224.
Kasri, M., Birjali, M., and Beni-Hssane, A., 2021, “Word2Sent: A New Learning Sentiment-Embedding Model with Low Dimension for Sentence Level Sentiment Classification,” Concurr. Comput. , 33(9), pp. 1–12.
Mikolov, T., Yih, W., and Zweig, G., 2013, “Linguistic Regularities in Continuous Space Word Representations,” Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751.
Erritali, M., Beni-Hssane, A., Birjali, M., and Madani, Y., 2016, “An Approach of Semantic Similarity Measure between Documents Based on Big Data,” Int. J. Electr. Comput. Eng., 6(5), pp. 1–10.
El-Ansari, A., Beni-Hssane, A., Saadi, M., and El Fissaoui, M., 2021, “PAPIR: Privacy-Aware Personalized Information Retrieval,” J. Ambient Intell. Humaniz. Comput., 12(10), pp. 9891–9907.
El-Ansari, A., Beni-hssane, A., and Saadi, M., 2020, “An Ontology-Based Profiling Method for Accurate Web Personalization Systems.”
Naderalvojoud, B., and Sezer, E. A., 2020, “Sentiment Aware Word Embeddings Using Refinement and Senti-Contextualized Learning Approach,” Neurocomputing, 405, pp. 149–160.
Rezaeinia, S. M., Rahmani, R., Ghodsi, A., and Veisi, H., 2019, “Sentiment Analysis Based on Improved Pre-Trained Word Embeddings,” Expert Syst. Appl., 117, pp. 139–147.
Yu, L.-C., Wang, J., Lai, K. R., and Zhang, X., 2018, “Refining Word Embeddings Using Intensity Scores for Sentiment Analysis,” IEEE/ACM Trans. Audio, Speech, Lang. Process., 26(3), pp. 671–681.
Baccianella, S., Esuli, A., and Sebastiani, F., 2010, “SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining,” Proceedings of the International Conference on Language Resources and Evaluation, {LREC} 2010, 17–23 May 2010, Valletta, Malta, N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, and D. Tapias, eds., European Language Resources Association.
Cambria, E., Poria, S., Hazarika, D., and Kwok, K., 2018, “SenticNet 5: Discovering Conceptual Primitives for Sentiment Analysis by Means of Context Embeddings,” Thirty-Second AAAI Conference on Artificial Intelligence.
Hutto, C. J., and Gilbert, E., 2015, “VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text,” Proceedings of the 8th International Conference on Weblogs and Social Media, ICWSM 2014.
Kasri, M., Birjali, M., El-Ansari, A., and Beni-Hssane, A., 2021, “Enhanced Word Embeddings with Sentiment Contextualized Vectors for Sentiment Analysis,” The International Conference on Information, Communication & Cybersecurity, ICI2C’21, Khouribga, Morocco, pp. 1–10.
Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C., 2003, “A Neural Probabilistic Language Model,” J. Mach. Learn. Res., 3(null), pp. 1137–1155.
Collobert, R., and Weston, J., 2008, “A Unified Architecture for Natural Language Processing,” Proceedings of the 25th International Conference on Machine Learning – ICML ’08, ACM Press, New York, New York, USA, pp. 160–167.
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P., 2011, “Natural Language Processing (Almost) from Scratch.”
Levy, O., and Goldberg, Y., 2014, “Dependency-Based Word Embeddings,” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, Baltimore, Maryland, pp. 302–308.
Radford, A., 2018, “Improving Language Understanding by Generative Pre-Training.”
Hu, B., Tang, B., Chen, Q., and Kang, L., 2016, “A Novel Word Embedding Learning Model Using the Dissociation between Nouns and Verbs,” Neurocomputing, 171, pp. 1108–1117.
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., and Potts, C., 2011, “Learning Word Vectors for Sentiment Analysis,” Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Portland, Oregon, USA, pp. 142–150.
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., and Qin, B., 2014, “Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification,” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 1555–1565.
Ren, Y., Zhang, Y., Zhang, M., and Ji, D., 2016, “Improving Twitter Sentiment Classification Using Topic-Enriched Multi-Prototype Word Embeddings,” Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI Press, pp. 3038–3044.
Lan, M., Zhang, Z., Lu, Y., and Wu, J., 2016, “Three Convolutional Neural Network-Based Models for Learning Sentiment Word Vectors towards Sentiment Analysis,” 2016 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 3172–3179.
Warriner, A. B., Kuperman, V., and Brysbaert, M., 2013, “Norms of Valence, Arousal, and Dominance for 13,915 English Lemmas,” Behav. Res. Methods, 45(4), pp. 1191–1207.
Yin, R., Li, P., and Wang, B., 2017, “Sentiment Lexical-Augmented Convolutional Neural Networks for Sentiment Analysis,” 2017 IEEE Second International Conference on Data Science in Cyberspace (DSC), IEEE, pp. 630–635.
Yu, L.-C., Wang, J., Lai, K. R., and Zhang, X., 2017, “Refining Word Embeddings for Sentiment Analysis,” Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 534–539.
Miller, G. A., 1995, “WordNet: A Lexical Database for English,” Commun. ACM, 38(11), pp. 39–41.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J., 1986, “Learning Representations by Back-Propagating Errors,” Nature, 323(6088), pp. 533–536.
Pang, B., and Lee, L., 2005, “Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales.”
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., and Potts, C., 2013, “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Seattle, Washington, USA, pp. 1631–1642.
Kim, Y., 2014, “Convolutional Neural Networks for Sentence Classification.”
Chen, T., Xu, R., He, Y., and Wang, X., 2017, “Improving Sentiment Analysis via Sentence Type Classification Using BiLSTM-CRF and CNN,” Expert Syst. Appl., 72, pp. 221–230.
Tai, K. S., Socher, R., and Manning, C. D., 2015, “Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks,” Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, Beijing, China, pp. 1556–1566.
Liu, P., Qiu, X., and Huang, X., 2016, “Recurrent Neural Network for Text Classification with Multi-Task Learning.”
Liu, G., and Guo, J., 2019, “Bidirectional LSTM with Attention Mechanism and Convolutional Layer for Text Classification,” Neurocomputing, 337, pp. 325–338.
Chen, X., Rao, Y., Xie, H., Wang, F. L., Zhao, Y., and Yin, J., 2019, “Sentiment Classification Using Negative and Intensive Sentiment Supplement Information,” Data Sci. Eng., 4(2), pp. 109–118.
El Makkaoui, K., Ezzati, A., Beni-Hssane, A., and Motamed, C., 2016, “Cloud Security and Privacy Model for Providing Secure Cloud Services,” 2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech), IEEE, pp. 81–86.
El Makkaoui, K., Beni-Hssane, A., and Ezzati, A., 2019, “Speedy Cloud-RSA Homomorphic Scheme for Preserving Data Confidentiality in Cloud Computing,” J. Ambient Intell. Humaniz. Comput., 10(12), pp. 4629–4640.
El Makkaoui, K., Ezzati, A., and Beni-Hssane, A., 2017, “Cloud-RSA: An Enhanced Homomorphic Encryption Scheme,” pp. 471–480.