The Marginal Value of Momentum for Small Learning Rate SGD

Wang, Runzhe; Malladi, Sadhika; Wang, Tianhao; Lyu, Kaifeng; Li, Zhiyuan

Computer Science > Machine Learning

arXiv:2307.15196 (cs)

[Submitted on 27 Jul 2023 (v1), last revised 16 Apr 2024 (this version, v2)]

Title:The Marginal Value of Momentum for Small Learning Rate SGD

Authors:Runzhe Wang, Sadhika Malladi, Tianhao Wang, Kaifeng Lyu, Zhiyuan Li

View PDF HTML (experimental)

Abstract:Momentum is known to accelerate the convergence of gradient descent in strongly convex settings without stochastic gradient noise. In stochastic optimization, such as training neural networks, folklore suggests that momentum may help deep learning optimization by reducing the variance of the stochastic gradient update, but previous theoretical analyses do not find momentum to offer any provable acceleration. Theoretical results in this paper clarify the role of momentum in stochastic settings where the learning rate is small and gradient noise is the dominant source of instability, suggesting that SGD with and without momentum behave similarly in the short and long time horizons. Experiments show that momentum indeed has limited benefits for both optimization and generalization in practical training regimes where the optimal learning rate is not very large, including small- to medium-batch training from scratch on ImageNet and fine-tuning language models on downstream tasks.

Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC)
Cite as:	arXiv:2307.15196 [cs.LG]
	(or arXiv:2307.15196v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2307.15196

Submission history

From: Runzhe Wang [view email]
[v1] Thu, 27 Jul 2023 21:01:26 UTC (135 KB)
[v2] Tue, 16 Apr 2024 03:25:54 UTC (501 KB)

Computer Science > Machine Learning

Title:The Marginal Value of Momentum for Small Learning Rate SGD

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:The Marginal Value of Momentum for Small Learning Rate SGD

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators