Dec 15, 2020 · We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for ...
We aim to provide a general and principled framework for distillation based on Wasserstein distance, where both global contrastive learning and local ...
scholar.google.com › citations
We aim to provide a general and principled framework for distillation based on Wasserstein distance, where both global contrastive learning and local ...
The detailed implementation of the proposed Wasserstein. Contrastive Representation Distillation (WCoRD) method is summarized in Algorithm 1. Algorithm 1 The ...
Wasserstein Contrastive Knowledge Distillation (WCKD) [9] encourages a student (compact) model features h s to be distributionally similar to those of an ...
This work proposes a novel way to achieve knowledge distillation: by distilling the knowledge through a quantized space, where the teacher's feature maps are ...
We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein ...
Dec 19, 2019 · We train a student to capture significantly more information in the teacher's representation of the data. We formulate this objective as contrastive learning.
Our method sets a new state-of-the-art in many transfer tasks, and sometimes even outperforms the teacher network when combined with knowledge distillation.
Missing: Wasserstein | Show results with:Wasserstein