When Do Flat Minima Optimizers Work?

AllBooks Images Videos Maps News Shopping

[2202.00661] When Do Flat Minima Optimizers Work? - arXiv

Feb 1, 2022 · Abstract:Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural ...

When Do Flat Minima Optimizers Work? - OpenReview

openreview.net › forum

Oct 31, 2022 · This paper analyses the properties of two flat minima optimizers, Stochastic Weight Averaging (SWA) and Sharpness-Aware Minimization (SAM). The ...

Gradient Norm Regularizer Seeks Flat Minima and Improves Generalization

Sharpness Minimization Algorithms Do Not Only ... - OpenReview

TRAM: Bridging Trust Regions and Sharpness Aware Minimization

More results from openreview.net

[PDF] When Do Flat Minima Optimizers Work?

proceedings.neurips.cc › paper › file

Two popular flat-minima optimization approaches are: 1. Stochastic Weight Averaging (SWA) [48], and 2. Sharpness-Aware Minimization (SAM) [22]. While both ...

[PDF] When Do Flat Minima Optimizers Work? - UCL Discovery

discovery.ucl.ac.uk › 2202.00661....

Recently, flat-minima optimizers, which seek to find parameters in low-loss neigh- borhoods, have been shown to improve a neural network's generalization ...

[PDF] When Do Flat Minima Optimizers Work? - Semantic Scholar

www.semanticscholar.org › paper

This work compares the loss surfaces of the models trained with each method and through broad benchmarking across computer vision, natural language ...

[R] The Generalization Mystery: Sharp vs Flat Minima, SGD and how it's all ...

www.reddit.com › comments › r_the_ge...

Jan 18, 2018 · Is there anything insightful we can say about why traversing the parameter space of a neural network with SGD works so well?

People also search for

When do flat minima optimizers work github

When do flat minima optimizers work in deep learning

How does backpropagation find the global loss minimum? - Reddit

www.reddit.com › comments › how_doe...

Aug 7, 2024 · From what I understand, gradient descent / backpropagation makes small changes to weights and biases akin to a ball slowly travelling down a ...

Understanding generalization through visualization

www.cs.umd.edu › ~tomg › project › ge...

The bias of stochastic optimizers towards good minima can be explained by the volume disparity between the attraction basins for good and bad minima. Flat ...

Weight-Averaged Sharpness-Aware Minimization (WASAM) - GitHub

github.com › JeanKaddour › WASAM

A minimum working example for incorporating WASAM in an image classification pipeline implemented in PyTorch. Usage. Simple option: Single closure-based step ...

Thoughts on loss landscapes and why deep learning works - Beren's Blog

www.beren.io › 2023-07-11-Loss-landsc...

Jul 11, 2023 · Intuitively, the argument goes like this: flat minima should have much larger associated volumes of low-loss surrounding them than sharp minima.

[2202.00661] When Do Flat Minima Optimizers Work? - arXiv