May 13, 2022 · In this paper, we study the emergence of heavy-tails in decentralized stochastic gradient descent (DE-SGD), and investigate the effect of ...
A real-valued random variable X is said to be heavy-tailed if the right tail or the left tail of the distribution decays slower than any exponential ...
We rigorously prove that, this phenomenon is not specific to deep learning and in fact it can be observed even in surprisingly simple settings: we show that ...
May 16, 2022 · Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to 'multiplicative noise', ...
In this paper, we study the emergence of heavy-tails in decentralized stochastic gradient descent. (DE-SGD), and investigate the effect of decentralization on ...
In this paper, we argue that these three seemingly unrelated perspectives for generalization are deeply linked to each other.
Sep 8, 2024 · In this paper, we study the emergence of heavy-tails in decentralized stochastic gradient descent (DE-SGD), and investigate the effect of ...
Our result shows that even in the simplest setting when the input data is Gaussian without any heavy tail, SGD iterates can lead to a heavy-tailed stationary ...
Oct 18, 2024 · This paper studies the algorithmic stability and generalizability of decentralized stochastic gradient descent (D-SGD). We prove that the ...
We rigorously prove that, this phenomenon is not specific to deep learning and in fact it can be observed even in surprisingly simple settings: we show that ...