Audio-driven facial landmark generation in violin performance using 3dcnn network with self attention model

TW Lin, CL Liu, L Su - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
TW Lin, CL Liu, L Su
ICASSP 2023-2023 IEEE International Conference on Acoustics …, 2023ieeexplore.ieee.org
In a music scenario, both auditory and visual elements are essential to achieve an
outstanding performance. Recent research has focused on the generation of body
movements or fingering from audio in music performance. The audio-driven face generation
technique in music performance is still deficient. In this paper, we compile a violin
soundtrack and facial expression dataset (VSFE) for modeling facial expressions in violin
performance. To our knowledge, this is the first dataset mapping the relationship between …
In a music scenario, both auditory and visual elements are essential to achieve an outstanding performance. Recent research has focused on the generation of body movements or fingering from audio in music performance. The audio-driven face generation technique in music performance is still deficient. In this paper, we compile a violin soundtrack and facial expression dataset (VSFE) for modeling facial expressions in violin performance. To our knowledge, this is the first dataset mapping the relationship between violin performance audio and musicians’ facial expressions. We then propose a 3DCNN network with self-attention and residual blocks for audio-driven facial expression generation. In the experiments, we compare our methods with three baselines on talking face generation. The codes and dataset are available on the Github (https://github.com/kevinlin91/icassp_music2face).
ieeexplore.ieee.org
Showing the best result for this search. See all results