Dec 17, 2021 · We propose Align and Prompt: an efficient and effective video-and-language pre-training framework with better cross-modal alignment.
In this paper, we propose Align and Prompt: a new video-and-language pre-training framework (ALPRO), which operates on sparsely-sampled video frames and.
Official PyTorch code for ALPRO. This repository supports pre-training as well as finetuning on Requirements Our implementation is tested on Ubuntu 20.04.1 ...
In this paper, we propose Align and Prompt: a new video-and-language pre-training framework (AlPro), which operates on sparsely-sampled video frames and ...
This paper proposes Align and Prompt: a new video-and-language pre-training framework (AlPro), which operates on sparsely-sampled video frames and achieves ...
In this paper, we propose Align and Prompt: a new video-and-language pre-training framework (AlPro), which operates on sparsely-sampled video frames.
In this paper, we propose Align and Prompt: a new video-and-language pre-training framework (AlPro), which operates on sparsely-sampled video frames.
In AL-PRO [30] , the authors introduce a video-text contrast (VTC) loss to align instance-level unimodal video-text features and design a prompt entity module ...
May 31, 2022 · An example of an entity prompt is the short text, “A video of {ENTITY}”, where ENTITY is a noun that appears often in the pre-training corpus.
Jun 3, 2022 · ALPRO (ALign and PROmpt) is a novel video-and-language pre-training system that provides a generic yet effective way of learning video-text representations.