×
Jun 13, 2023 · In this work, we consider a new class of objective functions, where only a subset of the parameters satisfies strong convexity, and show Nesterov's momentum ...
The acceleration of both the Heavy Ball method and Nesterov's momentum is shown only for shallow ReLU networks. (Wang et al., 2021; Liu et al., 2022a) and deep ...
We provide two realizations of the problem class, one of which is deep ReLU networks, which constitutes this work as the first that proves an accelerated ...
Sep 5, 2022 · In 1983, Nesterov [5] proposed the NAG method and proved that it has the optimal convergence rate for convex problem with Lipschitz gradient.
Jun 13, 2023 · Current state-of-the-art analyses on the convergence of gradient descent for training neural networks focus on characterizing properties of ...
We study the convergence rate of first-order methods for rectangular matrix factor- ization, which is a canonical nonconvex optimization problem.
Momentum methods, including the heavy ball (HB) method and Nesterov's accelerated gradient (NAG) method, are the workhorse of first-order gradient methods ...
This work considers the training problem of the two-layer ReLU neural network under over-parameterization and random initialization and establishes tighter ...
People also ask
Nesterov's accelerated gradient (NAG) is one of the most popular accelerated optimizers in the deep learning community, which often exhibits improved ...
We analyze the convergence of NAG for two-layer fully connected neural network with ReLU activation. Specifically, we prove that the error of NAG converges to ...