Visual Keyword Spotting with Attention

Prajwal, K R; Momeni, Liliane; Afouras, Triantafyllos; Zisserman, Andrew

Computer Science > Computer Vision and Pattern Recognition

arXiv:2110.15957 (cs)

[Submitted on 29 Oct 2021]

Title:Visual Keyword Spotting with Attention

Authors:K R Prajwal, Liliane Momeni, Triantafyllos Afouras, Andrew Zisserman

View PDF

Abstract:In this paper, we consider the task of spotting spoken keywords in silent video sequences -- also known as visual keyword spotting. To this end, we investigate Transformer-based models that ingest two streams, a visual encoding of the video and a phonetic encoding of the keyword, and output the temporal location of the keyword if present. Our contributions are as follows: (1) We propose a novel architecture, the Transpotter, that uses full cross-modal attention between the visual and phonetic streams; (2) We show through extensive evaluations that our model outperforms the prior state-of-the-art visual keyword spotting and lip reading methods on the challenging LRW, LRS2, LRS3 datasets by a large margin; (3) We demonstrate the ability of our model to spot words under the extreme conditions of isolated mouthings in sign language videos.

Comments:	Appears in: British Machine Vision Conference 2021 (BMVC 2021)
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2110.15957 [cs.CV]
	(or arXiv:2110.15957v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2110.15957

Submission history

From: Prajwal K R [view email]
[v1] Fri, 29 Oct 2021 17:59:04 UTC (2,448 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2021-10

Change to browse by:

cs
cs.CL

References & Citations

DBLP - CS Bibliography

listing | bibtex

Triantafyllos Afouras
Andrew Zisserman

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Visual Keyword Spotting with Attention

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Visual Keyword Spotting with Attention

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators