Invertible Frowns: Video-to-Video Facial Emotion Translation

Magnusson, Ian; Sankaranarayanan, Aruna; Lippman, Andrew

Computer Science > Computer Vision and Pattern Recognition

arXiv:2109.08061 (cs)

[Submitted on 16 Sep 2021 (v1), last revised 22 Oct 2021 (this version, v2)]

Title:Invertible Frowns: Video-to-Video Facial Emotion Translation

Authors:Ian Magnusson, Aruna Sankaranarayanan, Andrew Lippman

View PDF

Abstract:We present Wav2Lip-Emotion, a video-to-video translation architecture that modifies facial expressions of emotion in videos of speakers. Previous work modifies emotion in images, uses a single image to produce a video with animated emotion, or puppets facial expressions in videos with landmarks from a reference video. However, many use cases such as modifying an actor's performance in post-production, coaching individuals to be more animated speakers, or touching up emotion in a teleconference require a video-to-video translation approach. We explore a method to maintain speakers' lip movements, identity, and pose while translating their expressed emotion. Our approach extends an existing multi-modal lip synchronization architecture to modify the speaker's emotion using L1 reconstruction and pre-trained emotion objectives. We also propose a novel automated emotion evaluation approach and corroborate it with a user study. These find that we succeed in modifying emotion while maintaining lip synchronization. Visual quality is somewhat diminished, with a trade off between greater emotion modification and visual quality between model variants. Nevertheless, we demonstrate (1) that facial expressions of emotion can be modified with nothing other than L1 reconstruction and pre-trained emotion objectives and (2) that our automated emotion evaluation approach aligns with human judgements.

Comments:	9 pages, 2 figures, 4 tables, accepted at ADGD @ ACM Multimedia 2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2109.08061 [cs.CV]
	(or arXiv:2109.08061v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2109.08061

Submission history

From: Ian Magnusson [view email]
[v1] Thu, 16 Sep 2021 15:43:51 UTC (9,592 KB)
[v2] Fri, 22 Oct 2021 15:44:08 UTC (9,592 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Invertible Frowns: Video-to-Video Facial Emotion Translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Invertible Frowns: Video-to-Video Facial Emotion Translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators