Bimodal Emotion Recognition Model for Minnan Songs
Abstract
:1. Introduction
2. Model
2.1. Attention-Based LSTM for Extracting Lyric Features
2.2. CNN Model for Extracting Audio Features
2.3. Multimodal Compact Bilinear Pooling
2.4. Classifier for Determining Song Emotion
3. Dataset and Preprocessing
3.1. Dataset
3.2. Data Preprocessing
4. Experimental Results
4.1. Experiment Settings
4.2. Results
4.2.1. Results for Unimodal Data
4.2.2. Results of Bimodal Data
5. Discussion
6. Conclusions and Future Work
Author Contributions
Funding
Conflicts of Interest
References
- Clark, H.R. The Fu of Minnan: A Local Clan in Late Tang and Song China (9th–13th Centuries). J. Econ. Soc. Hist. Orient 1995, 38, 1. [Google Scholar] [CrossRef]
- Chuimei, H. The ceramic boom in Minnan during Song and Yuan times. In The Emporium of the World: Maritime Quanzhou, 1000–1400; Brill: Leiden, the Netherlands, 2001; pp. 237–282. [Google Scholar]
- Lin, M.; Chu, C.C.; Chang, S.L.; Lee, H.L.; Loo, J.H.; Akaza, T.; Juji, T.; Ohashi, J.; Tokunaga, K. The origin of Minnan and Hakka, the so-called Taiwanese, inferred by HLA study. Tissue Antigens 2001, 57, 192–199. [Google Scholar] [CrossRef] [PubMed]
- Ali, S.O.; Peynircioğlu, Z.F. Songs and emotions: Are lyrics and melodies equal partners? Psychol. Music 2006, 34, 511–534. [Google Scholar] [CrossRef]
- Hu, Z.L.; Xu, G.H.; Li, Y.; Liu, Q.M. Study on Choral Music in Minnan Dialect in Fujian and Taiwan. J. Jimei Univ. (Philosophy Soc. Sci.) 2011, 14, 17. [Google Scholar]
- Lihong, C. An Analysis of Folk Songs in Minnan Dialect. J. Jimei Univ. (Philosophy Soc. Sci.) 2011, 30, 136–140. [Google Scholar]
- Perlovsky, L. Cognitive function, origin, and evolution of musical emotions. Music. Sci. 2012, 16, 185–199. [Google Scholar] [CrossRef] [Green Version]
- Li, T.; Ogihara, M. Detecting Emotion in Music. 2003. Available online: https://jscholarship.library.jhu.edu/handle/1774.2/41 (accessed on 2 March 2020).
- Li, T.; Ogihara, M. Toward intelligent music information retrieval. IEEE Trans. Multimed. 2006, 8, 564–574. [Google Scholar]
- Kirandziska, V.; Ackovska, N. Finding important sound features for emotion evaluation classification. In Proceedings of the Eurocon 2013, Zagreb, Croatia, 1–4 July 2013; pp. 1637–1644. [Google Scholar]
- Misron, M.M.; Rosli, N.; Manaf, N.A.; Halim, H.A. Music Emotion Classification (MEC): Exploiting Vocal and Instrumental Sound Features. In Recent Advances on Soft Computing and Data Mining; Springer: Cham, Switzerland, 2014. [Google Scholar]
- Ridoean, J.A.; Sarno, R.; Sunaryo, D.; Wijaya, D.R. Music mood classification using audio power and audio harmonicity based on MPEG-7 audio features and Support Vector Machine. In Proceedings of the 2017 3rd International Conference on Science in Information Technology (ICSITech), Bandung, Indonesia, 25–26 October 2017; pp. 72–76. [Google Scholar]
- Eerola, T.; Lartillot, O.; Toiviainen, P. Prediction of Multidimensional Emotional Ratings in Music from Audio Using Multivariate Regression Models. In Proceedings of the 10th International Society for Music Information Retrieval Conference, Kobe, Japan, 26–30 October 2009; pp. 621–626. [Google Scholar]
- Lartillot, O.; Toiviainen, P. A Matlab toolbox for musical feature extraction from audio. In Proceedings of the International Conference on Digital Audio Effects, Bordeaux, France, 10–15 September 2007; pp. 237–244. [Google Scholar]
- Gómez, E. Tonal Description of Music Audio Signals. Ph.D. Thesis, Universitat Pompeu Fabra, Barcelona, Spain, 2006. [Google Scholar]
- Stoller, D.; Durand, S.; Ewert, S. End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-character Recognition Model. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 181–185. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Sivic, J.; Zisserman, A. Efficient visual search of videos cast as text retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 31, 591–606. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sebastiani, F. Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 2002, 34, 1–47. [Google Scholar] [CrossRef]
- Hofmann, T. Probabilistic latent semantic indexing. ACM SIGIR Forum; ACM Press: New York, NY, USA, 1999; Volume 51, pp. 211–218. [Google Scholar]
- Hu, X.; Downie, J.S.; Ehmann, A.F. Lyric text mining in music mood classification. Am. Music 2009, 183, 2–209. [Google Scholar]
- Laurier, C.; Herrera, P. Mood cloud: A real-time music mood visualization tool. In Computer Music Modeling and Retrieval. Sense of Sounds; Springer: Cham, Switzerland, 2008. [Google Scholar]
- Lu, Q.; Chen, X.; Yang, D.; Wang, J. Boosting for Multi-Modal Music Emotion. In Proceedings of the 11th International Society for Music Information and Retrieval Conference, Utrecht, The Netherlands, 9–13 August 2010; p. 105. [Google Scholar]
- Meyers, O.C. A mood-based music classification and exploration system. Master’s Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2007. [Google Scholar]
- Yang, D.; Lee, W.S. Disambiguating Music Emotion Using Software Agents. In Proceedings of the 5th International Conference on Music Information Retrieval, Barcelona, Spain, 10–14 October 2004; pp. 218–223. [Google Scholar]
- Yang, Y.H.; Lin, Y.C.; Su, Y.F.; Chen, H.H. A regression approach to music emotion recognition. IEEE Trans. Audio Speech Lang. Process. 2008, 16, 448–457. [Google Scholar] [CrossRef]
- Cheng, H.T.; Yang, Y.H.; Lin, Y.C.; Liao, I.B.; Chen, H.H. Automatic chord recognition for music classification and retrieval. In Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, Hannover, Germany, 23 June–26 April 2008; pp. 1505–1508. [Google Scholar]
- Dang, T.T.; Shirai, K. Machine learning approaches for mood classification of songs toward music search engine. In Proceedings of the 2009 International Conference on Knowledge and Systems Engineering, Hanoi, Vietnam, 13–17 October 2009; pp. 144–149. [Google Scholar]
- Xia, Y.; YANG, Y.; ZHANG, P.; LIU, Y. Lyric-Based Song Sentiment Analysis by Sentiment Vector Space Model. J. Chin. Inf. Process. 2010, 24, 99–104. [Google Scholar]
- Hevner, K. Expression in music: A discussion of experimental studies and theories. Psychol. Rev. 1935, 42, 186. [Google Scholar] [CrossRef]
- Li, J.; Lin, H.; Li, R. Sentiment Vector Space Model Based Music Emotion Tag Prediction. J. Chin. Inf. Process. 2012, 26, 45–51. [Google Scholar]
- Raschka, S. MusicMood: Predicting the mood of music from song lyrics using machine learning. arXiv 2016, arXiv:1611.00138. [Google Scholar]
- Patra, B.G.; Das, D.; Bandyopadhyay, S. Mood classification of hindi songs based on lyrics. In Proceedings of the 12th International Conference on Natural Language Processing; Sharma, D.M., Sangal, R., Sherly, E., Eds.; NLP Association of India: Trivandrum, India, 2015; pp. 261–267. [Google Scholar]
- Miotto, R. Content-based Music Access: Combining Audio Features and Semantic Information for Music Search Engines. Master’s Thesis, Universita degli Studi di Pavia, Pavia, Italy, 2011. [Google Scholar]
- Hu, X.; Downie, J.S. Improving mood classification in music digital libraries by combining lyrics and audio. In Proceedings of the 10th Annual Joint Conference on Digital Libraries; Association for Computing Machinery: New York, NY, USA, 2010; pp. 159–168. [Google Scholar]
- Jamdar, A.; Abraham, J.; Khanna, K.; Dubey, R. Emotion analysis of songs based on lyrical and audio features. arXiv 2015, arXiv:1506.05012. [Google Scholar] [CrossRef]
- Lee, C.W.; Song, K.Y.; Jeong, J.; Choi, W.Y. Convolutional attention networks for multimodal emotion recognition from speech and text data. In Proceedings of the first Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML), Vancouver, BC, Canada, 15–20 July 2018; pp. 28–34. [Google Scholar]
- Lin, Y.C.; Yang, Y.H.; Chen, H.H.; Liao, I.B.; Ho, Y.C. Exploiting genre for music emotion classification. In Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, New York, NY, USA, 28 June–3 July 2009; pp. 618–621. [Google Scholar]
- Schuller, B.; Eyben, F.; Rigoll, G. Tango or waltz?: Putting ballroom dance style into tempo detection. EURASIP J. Audio Speech Music Process. 2008, 2008, 846135. [Google Scholar] [CrossRef] [Green Version]
- Schuller, B.; Dorfner, J.; Rigoll, G. Determination of nonprototypical valence and arousal in popular music: Features and performances. EURASIP J. Audio Speech Music Process. 2010, 2010, 735854. [Google Scholar] [CrossRef]
- Charikar, M.; Chen, K.; Farach-Colton, M. Finding Frequent Items in Data Streams. In Automata, Languages and Programming; Widmayer, P., Eidenbenz, S., Triguero, F., Morales, R., Conejo, R., Hennessy, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2002; pp. 693–703. [Google Scholar]
- Huang, R. Taiwanese Songs Cultural Communication Studies. Ph.D. Thesis, Huazhong Normal University, Wuhan, China, 2015. [Google Scholar]
- Durrani, N.; Hussain, S. Urdu Word Segmentation. In Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Los Angeles, CA, USA, 2–4 June 2010. [Google Scholar]
- Yang, Y. Discussion on the construction technology and usage of Taiwan Chinese Online Dictionary. In Proceedings of the 2003 3rd International Symposium on Chinese Online Education; Taiwan University: Taibei, Taiwan, 2003; pp. 132–141. [Google Scholar]
- Eyben, F.; Wöllmer, M.; Schuller, B. openSMILE—The Munich Versatile and Fast Open-Source Audio Feature Extractor. In Proceedings of the 9th ACM International Conference on Multimedia, Firenze, Italy, 25–29 October 2010. [Google Scholar]
- Matsumoto, K.; Sasayama, M.; Yoshida, M.; Kita, K.; Ren, F. Transfer Learning Based on Utterance Emotion Corpus for Lyric Emotion Estimation. In Proceedings of the 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), Nanjing, China, 23–25 November 2018; pp. 699–703. [Google Scholar] [CrossRef]
- An, Y.; Sun, S.; Wang, S. Naive Bayes classifiers for music emotion classification based on lyrics. In Proceedings of the 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), Wuhan, China, 24–26 May 2017; pp. 635–638. [Google Scholar] [CrossRef]
- Lim, W.; Jang, D.; Lee, T. Speech emotion recognition using convolutional and Recurrent Neural Networks. In Proceedings of the 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Jeju, Korea, 13–16 December 2016; pp. 1–4. [Google Scholar] [CrossRef]
- Fayek, H.M.; Lech, M.; Cavedon, L. Evaluating deep learning architectures for Speech Emotion Recognition. Neural Netw. 2017, 92, 60–68. [Google Scholar] [CrossRef] [PubMed]
- Poria, S.; Chaturvedi, I.; Cambria, E.; Hussain, A. Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016; pp. 439–448. [Google Scholar] [CrossRef] [Green Version]
- Jeon, B.; Kim, C.; Kim, A.; Kim, D.; Park, J.; Ha, J.W. Music Emotion Recognition via End-to-End Multimodal Neural Networks. 2017. Available online: https://pdfs.semanticscholar.org/ce3b/93d715b16ab9f7b65442d37a9fedcee18071.pdf (accessed on 3 March 2020).
Time | Love | Lovelorn | Inspirational | Lonely | Homesick | Missing someone | Leave | Total |
---|---|---|---|---|---|---|---|---|
80’s | 39 | 68 | 34 | 47 | 28 | 36 | 35 | 287 |
90’s | 45 | 107 | 24 | 33 | 6 | 48 | 17 | 280 |
00’s | 44 | 119 | 33 | 26 | 12 | 46 | 15 | 295 |
10’s | 54 | 99 | 34 | 23 | 22 | 59 | 9 | 300 |
Experimental Group | Data Resource | Method for Extracting Features | Main Parameters | Classifier |
---|---|---|---|---|
a1 | Lyrics | TF-IDF | = 297 | SVM |
a2 | Lyrics | LSTM | = 64 | SVM |
a3 | Lyrics | LSTM | = 64 | D |
a4 | Lyrics | LSTM | = 128 | D |
a5 | Lyrics | A | = 64 | SVM |
a6 | Lyrics | A | = 64 | D |
a7 | Lyrics | A | = 128 | D |
a8 | Lyrics | [46] | \ | \ |
a9 | Lyrics | [47] | \ | \ |
b1 | Audio | OpenSMILE | = 384 | SVM |
b2 | Audio | OpenSMILE | = 1582 | SVM |
b3 | Audio | B | = 64, n = 15 | SVM |
b4 | Audio | B | = 64, n = 15 | D |
b5 | Audio | B | = 64, n = 30 | D |
b6 | Audio | B | = 128, n = 30 | D |
b7 | Audio | [48] | \ | \ |
b8 | Audio | [49] | \ | \ |
c1 | Lyrics, Audio | TF-IDF+OpenSMILE | = 297, = 384 | D |
c2 | Lyrics, Audio | LSTM+B | = 64, = 64, n = 30 | D |
c3 | Lyrics, Audio | A+B | = 64, = 64, n = 30 | D |
c4 | Lyrics, Audio | TF-IDF+OpenSMILE+C | = 297, = 384, n = 30 | D |
c5 | Lyrics, Audio | LSTM+B+C | = 64, = 64, n = 30 | D |
c6 | Lyrics, Audio | A+B+C | = 64, = 64, n = 30 | D |
c7 | Lyrics, Audio | [51] | \ | \ |
c8 | Lyrics, Audio | [50] | \ | \ |
Group | Accuracy | Precision | Recall | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | |
a1 | 0.22 | 0.23 | 0.25 | 0.27 | 0.30 | 0.12 | 0.12 | 0.15 | 0.16 | 0.16 | 0.14 | 0.14 | 0.14 | 0.14 | 0.14 |
a2 | 0.28 | 0.28 | 0.31 | 0.30 | 0.31 | 0.21 | 0.21 | 0.22 | 0.25 | 0.29 | 0.18 | 0.19 | 0.19 | 0.23 | 0.26 |
a3 | 0.31 | 0.31 | 0.34 | 0.35 | 0.35 | 0.23 | 0.23 | 0.25 | 0.32 | 0.32 | 0.20 | 0.21 | 0.21 | 0.27 | 0.30 |
a4 | 0.30 | 0.30 | 0.32 | 0.32 | 0.33 | 0.24 | 0.21 | 0.25 | 0.31 | 0.31 | 0.19 | 0.19 | 0.20 | 0.24 | 0.28 |
a5 | 0.35 | 0.37 | 0.39 | 0.40 | 0.40 | 0.29 | 0.28 | 0.28 | 0.38 | 0.36 | 0.27 | 0.26 | 0.27 | 0.31 | 0.34 |
a6 | 0.37 | 0.37 | 0.40 | 0.41 | 0.45 | 0.26 | 0.29 | 0.32 | 0.36 | 0.38 | 0.26 | 0.26 | 0.28 | 0.30 | 0.35 |
a7 | 0.34 | 0.35 | 0.35 | 0.37 | 0.42 | 0.28 | 0.29 | 0.31 | 0.32 | 0.35 | 0.24 | 0.26 | 0.27 | 0.31 | 0.27 |
a8 | 0.32 | 0.32 | 0.36 | 0.33 | 0.36 | 0.24 | 0.22 | 0.25 | 0.28 | 0.33 | 0.19 | 0.21 | 0.22 | 0.25 | 0.26 |
a9 | 0.20 | 0.20 | 0.22 | 0.23 | 0.24 | 0.12 | 0.11 | 0.13 | 0.14 | 0.14 | 0.11 | 0.12 | 0.13 | 0.12 | 0.14 |
b1 | 0.28 | 0.28 | 0.28 | 0.29 | 0.30 | 0.20 | 0.20 | 0.20 | 0.21 | 0.22 | 0.19 | 0.18 | 0.18 | 0.20 | 0.19 |
b2 | 0.27 | 0.27 | 0.28 | 0.28 | 0.28 | 0.19 | 0.19 | 0.21 | 0.21 | 0.21 | 0.17 | 0.17 | 0.18 | 0.19 | 0.19 |
b3 | 0.38 | 0.42 | 0.44 | 0.49 | 0.52 | 0.34 | 0.38 | 0.41 | 0.45 | 0.48 | 0.29 | 0.31 | 0.32 | 0.36 | 0.41 |
b4 | 0.43 | 0.45 | 0.46 | 0.48 | 0.51 | 0.41 | 0.43 | 0.43 | 0.44 | 0.49 | 0.32 | 0.34 | 0.35 | 0.38 | 0.44 |
b5 | 0.49 | 0.51 | 0.51 | 0.53 | 0.57 | 0.46 | 0.46 | 0.43 | 0.48 | 0.53 | 0.37 | 0.41 | 0.41 | 0.46 | 0.52 |
b6 | 0.54 | 0.56 | 0.58 | 0.60 | 0.62 | 0.49 | 0.51 | 0.55 | 0.54 | 0.58 | 0.41 | 0.44 | 0.47 | 0.51 | 0.56 |
b7 | 0.50 | 0.53 | 0.56 | 0.56 | 0.60 | 0.50 | 0.51 | 0.44 | 0.49 | 0.57 | 0.38 | 0.45 | 0.45 | 0.48 | 0.53 |
b8 | 0.52 | 0.54 | 0.52 | 0.56 | 0.59 | 0.47 | 0.50 | 0.44 | 0.53 | 0.53 | 0.38 | 0.43 | 0.41 | 0.49 | 0.52 |
Group | Accuracy | Precision | Recall | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | |
c1 | 0.34 | 0.35 | 0.35 | 0.4 | 0.4 | 0.27 | 0.28 | 0.29 | 0.32 | 0.33 | 0.23 | 0.24 | 0.25 | 0.27 | 0.27 |
c2 | 0.71 | 0.73 | 0.72 | 0.75 | 0.77 | 0.65 | 0.68 | 0.70 | 0.72 | 0.74 | 0.64 | 0.68 | 0.7 | 0.73 | 0.73 |
c3 | 0.78 | 0.79 | 0.77 | 0.78 | 0.78 | 0.78 | 0.79 | 0.79 | 0.76 | 0.73 | 0.75 | 0.76 | 0.74 | 0.75 | 0.76 |
c4 | 0.36 | 0.37 | 0.35 | 0.44 | 0.43 | 0.29 | 0.29 | 0.30 | 0.37 | 0.34 | 0.26 | 0.27 | 0.31 | 0.31 | 0.33 |
c5 | 0.76 | 0.76 | 0.78 | 0.79 | 0.78 | 0.76 | 0.79 | 0.76 | 0.79 | 0.81 | 0.70 | 0.72 | 0.74 | 0.76 | 0.77 |
c6 | 0.81 | 0.83 | 0.80 | 0.82 | 0.81 | 0.79 | 0.82 | 0.82 | 0.84 | 0.83 | 0.77 | 0.79 | 0.80 | 0.81 | 0.81 |
c7 | 0.73 | 0.74 | 0.73 | 0.75 | 0.75 | 0.61 | 0.64 | 0.68 | 0.67 | 0.69 | 0.64 | 0.63 | 0.66 | 0.69 | 0.71 |
c8 | 0.68 | 0.69 | 0.70 | 0.71 | 0.71 | 0.58 | 0.59 | 0.58 | 0.63 | 0.65 | 0.60 | 0.63 | 0.64 | 0.67 | 0.67 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiang, Z.; Dong, X.; Li, Y.; Yu, F.; Xu, X.; Wu, H. Bimodal Emotion Recognition Model for Minnan Songs. Information 2020, 11, 145. https://doi.org/10.3390/info11030145
Xiang Z, Dong X, Li Y, Yu F, Xu X, Wu H. Bimodal Emotion Recognition Model for Minnan Songs. Information. 2020; 11(3):145. https://doi.org/10.3390/info11030145
Chicago/Turabian StyleXiang, Zhenglong, Xialei Dong, Yuanxiang Li, Fei Yu, Xing Xu, and Hongrun Wu. 2020. "Bimodal Emotion Recognition Model for Minnan Songs" Information 11, no. 3: 145. https://doi.org/10.3390/info11030145
APA StyleXiang, Z., Dong, X., Li, Y., Yu, F., Xu, X., & Wu, H. (2020). Bimodal Emotion Recognition Model for Minnan Songs. Information, 11(3), 145. https://doi.org/10.3390/info11030145