An Empirical Study of Self-supervised Learning with Wasserstein Distance

Yamada, Makoto; Takezawa, Yuki; Houry, Guillaume; Dusterwald, Kira Michaela; Sulem, Deborah; Zhao, Han; Tsai, Yao-Hung Hubert

Statistics > Machine Learning

arXiv:2310.10143 (stat)

[Submitted on 16 Oct 2023 (v1), last revised 5 Feb 2024 (this version, v2)]

Title:An Empirical Study of Self-supervised Learning with Wasserstein Distance

Authors:Makoto Yamada, Yuki Takezawa, Guillaume Houry, Kira Michaela Dusterwald, Deborah Sulem, Han Zhao, Yao-Hung Hubert Tsai

View PDF HTML (experimental)

Abstract:In this study, we delve into the problem of self-supervised learning (SSL) utilizing the 1-Wasserstein distance on a tree structure (a.k.a., Tree-Wasserstein distance (TWD)), where TWD is defined as the L1 distance between two tree-embedded vectors. In SSL methods, the cosine similarity is often utilized as an objective function; however, it has not been well studied when utilizing the Wasserstein distance. Training the Wasserstein distance is numerically challenging. Thus, this study empirically investigates a strategy for optimizing the SSL with the Wasserstein distance and finds a stable training procedure. More specifically, we evaluate the combination of two types of TWD (total variation and ClusterTree) and several probability models, including the softmax function, the ArcFace probability model, and simplicial embedding. We propose a simple yet effective Jeffrey divergence-based regularization method to stabilize optimization. Through empirical experiments on STL10, CIFAR10, CIFAR100, and SVHN, we find that a simple combination of the softmax function and TWD can obtain significantly lower results than the standard SimCLR. Moreover, a simple combination of TWD and SimSiam fails to train the model. We find that the model performance depends on the combination of TWD and probability model, and that the Jeffrey divergence regularization helps in model training. Finally, we show that the appropriate combination of the TWD and probability model outperforms cosine similarity-based representation learning.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2310.10143 [stat.ML]
	(or arXiv:2310.10143v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2310.10143

Submission history

From: Makoto Yamada [view email]
[v1] Mon, 16 Oct 2023 07:31:30 UTC (1,142 KB)
[v2] Mon, 5 Feb 2024 21:20:28 UTC (1,176 KB)

Statistics > Machine Learning

Title:An Empirical Study of Self-supervised Learning with Wasserstein Distance

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:An Empirical Study of Self-supervised Learning with Wasserstein Distance

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators