Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM

Atmaja, Bagus Tris; Akagi, Masato

doi:10.1016/j.specom.2020.11.003

Computer Science > Sound

arXiv:2210.14495 (cs)

[Submitted on 26 Oct 2022]

Title:Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM

Authors:Bagus Tris Atmaja, Masato Akagi

View PDF

Abstract:Automatic speech emotion recognition (SER) by a computer is a critical component for more natural human-machine interaction. As in human-human interaction, the capability to perceive emotion correctly is essential to take further steps in a particular situation. One issue in SER is whether it is necessary to combine acoustic features with other data such as facial expressions, text, and motion capture. This research proposes to combine acoustic and text information by applying a late-fusion approach consisting of two steps. First, acoustic and text features are trained separately in deep learning systems. Second, the prediction results from the deep learning systems are fed into a support vector machine (SVM) to predict the final regression score. Furthermore, the task in this research is dimensional emotion modeling because it can enable a deeper analysis of affective states. Experimental results show that this two-stage, late-fusion approach, obtains higher performance than that of any one-stage processing, with a linear correlation from one-stage to two-stage processing. This late-fusion approach improves previous early fusion results measured in concordance correlation coefficients score.

Comments:	Published in Speech Communications
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2210.14495 [cs.SD]
	(or arXiv:2210.14495v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2210.14495
Journal reference:	Speech Commun., vol. 126, pp. 9-21, Feb. 2021
Related DOI:	https://doi.org/10.1016/j.specom.2020.11.003

Submission history

From: Bagus Tris Atmaja Mr [view email]
[v1] Wed, 26 Oct 2022 05:49:13 UTC (800 KB)

Computer Science > Sound

Title:Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators