Learning threshold neurons via the "edge of stability"

Ahn, Kwangjun; Bubeck, Sébastien; Chewi, Sinho; Lee, Yin Tat; Suarez, Felipe; Zhang, Yi

Computer Science > Machine Learning

arXiv:2212.07469 (cs)

[Submitted on 14 Dec 2022 (v1), last revised 19 Oct 2023 (this version, v2)]

Title:Learning threshold neurons via the "edge of stability"

Authors:Kwangjun Ahn, Sébastien Bubeck, Sinho Chewi, Yin Tat Lee, Felipe Suarez, Yi Zhang

View PDF

Abstract:Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate. This lies in stark contrast to practical wisdom and empirical studies, such as the work of J. Cohen et al. (ICLR 2021), which exhibit startling new phenomena (the "edge of stability" or "unstable convergence") and potential benefits for generalization in the large learning rate regime. Despite a flurry of recent works on this topic, however, the latter effect is still poorly understood. In this paper, we take a step towards understanding genuinely non-convex training dynamics with large learning rates by performing a detailed analysis of gradient descent for simplified models of two-layer neural networks. For these models, we provably establish the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural network fails to learn "threshold-like" neurons (i.e., neurons with a non-zero first-layer bias). This elucidates one possible mechanism by which the edge of stability can in fact lead to better generalization, as threshold neurons are basic building blocks with useful inductive bias for many tasks.

Comments:	31 pages, 13 figures, Published at NeurIPS 2023
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Cite as:	arXiv:2212.07469 [cs.LG]
	(or arXiv:2212.07469v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2212.07469

Submission history

From: Kwangjun Ahn [view email]
[v1] Wed, 14 Dec 2022 19:27:03 UTC (3,505 KB)
[v2] Thu, 19 Oct 2023 12:00:54 UTC (4,109 KB)

Computer Science > Machine Learning

Title:Learning threshold neurons via the "edge of stability"

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Learning threshold neurons via the "edge of stability"

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators