Automated emotional valence estimation in infants with stochastic and strided temporal sampling

M Ning, IO Ertugrul, DS Messinger… - 2023 11th …, 2023 - ieeexplore.ieee.org
2023 11th International Conference on Affective Computing and …, 2023ieeexplore.ieee.org
We propose the first automated approach to estimate the emotional valence of infants from
their facial behavior. We use the state-of-the-art transformer-based video masked
autoencoder (VideoMAE) that is pre-trained on a large video dataset as a backbone, and
finetune it on two large, well-annotated infant video datasets (SIBSMILE and MODELING).
To augment the limited data, we propose a novel video temporal augmentation method
called Stochastic and Strided Temporal Sampling (SSTS). We demonstrate the effectiveness …
We propose the first automated approach to estimate the emotional valence of infants from their facial behavior. We use the state-of-the-art transformer-based video masked autoencoder (VideoMAE) that is pre-trained on a large video dataset as a backbone, and finetune it on two large, well-annotated infant video datasets (SIBSMILE and MODELING). To augment the limited data, we propose a novel video temporal augmentation method called Stochastic and Strided Temporal Sampling (SSTS). We demonstrate the effectiveness of our approach for infant valence estimation by achieving 0.671 Concordance Correlation Coefficient (CCC) on SIBSMILE and MODELING. The experiments show that SSTS remarkably accelerates the training speed by 8 times while gaining the best valence estimation performance. Lastly, we suggest that face detection and cropping (coarse registration) is a promising alternative to landmark-based registration (i.e. fine registration) in data pre-processing when accurate infant facial landmark detectors are inaccessible.
ieeexplore.ieee.org
Showing the best result for this search. See all results