Mar 29, 2022 · In this work, we propose a novel AAC system called CLIP-AAC to learn interactive cross-modality representation with both acoustic and textual information.
We now introduce the proposed CLIP-AAC architecture, which consists of three modules: the encoder, the contrastive learning module and the decoder, as ...
This work proposes a novel AAC system called CLIP-AAC to learn interactive cross-modality representation with both acoustic and textual information, ...
Recently, Liu et al. [9] use contrastive learning to improve audio captioning performance in a data scarcity scenario. Chen et al.
Oct 9, 2022 · Automated audio captioning is a cross-modal translation task that aims to generate natural language descriptions for given audio clips.
Audio Captioning is the task of describing audio using text. The general approach is to use an audio encoder to encode the audio.
The audio data is encoded into a latent representation and aligned with its corresponding text description. Then a decoder is used to generate the captions.
This repository is a list of papers that are focusing on audio captioning. The papers are grouped according to the year that are published.
The audio data is encoded into a latent representation and aligned with its corresponding text description. Then a decoder is used to generate the captions.
Missing: Interactive | Show results with:Interactive
Oct 9, 2022 · Automated audio captioning is a cross-modal translation task that aims to generate natural language descriptions for given audio clips.