Data augmentation techniques for the Video Question Answering task

Falcon, Alex; Lanz, Oswald; Serra, Giuseppe

Computer Science > Computer Vision and Pattern Recognition

arXiv:2008.09849 (cs)

[Submitted on 22 Aug 2020]

Title:Data augmentation techniques for the Video Question Answering task

Authors:Alex Falcon, Oswald Lanz, Giuseppe Serra

View PDF

Abstract:Video Question Answering (VideoQA) is a task that requires a model to analyze and understand both the visual content given by the input video and the textual part given by the question, and the interaction between them in order to produce a meaningful answer. In our work we focus on the Egocentric VideoQA task, which exploits first-person videos, because of the importance of such task which can have impact on many different fields, such as those pertaining the social assistance and the industrial training. Recently, an Egocentric VideoQA dataset, called EgoVQA, has been released. Given its small size, models tend to overfit quickly. To alleviate this problem, we propose several augmentation techniques which give us a +5.5% improvement on the final accuracy over the considered baseline.

Comments:	16 pages, 5 figures; to be published in Egocentric Perception, Interaction and Computing (EPIC) Workshop Proceedings, at ECCV 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2008.09849 [cs.CV]
	(or arXiv:2008.09849v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2008.09849

Submission history

From: Alex Falcon [view email]
[v1] Sat, 22 Aug 2020 14:34:55 UTC (3,495 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2020-08

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Oswald Lanz
Giuseppe Serra

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Data augmentation techniques for the Video Question Answering task

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Data augmentation techniques for the Video Question Answering task

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators