On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions

Simon Martin, Francis Bach, Giulio Biroli
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3655-3663, 2024.

Abstract

We study the training dynamics of a shallow neural network with quadratic activation functions and quadratic cost in a teacher-student setup. In line with previous works on the same neural architecture, the optimization is performed following the gradient flow on the population risk, where the average over data points is replaced by the expectation over their distribution, assumed to be Gaussian. We first derive convergence properties for the gradient flow and quantify the overparameterization that is necessary to achieve a strong signal recovery. Then, assuming that the teachers and the students at initialization form independent orthonormal families, we derive a high-dimensional limit for the flow and show that the minimal overparameterization is sufficient for strong recovery. We verify by numerical experiments that these results hold for more general initializations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-martin24a, title = {On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions}, author = {Martin, Simon and Bach, Francis and Biroli, Giulio}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {3655--3663}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/martin24a/martin24a.pdf}, url = {https://proceedings.mlr.press/v238/martin24a.html}, abstract = {We study the training dynamics of a shallow neural network with quadratic activation functions and quadratic cost in a teacher-student setup. In line with previous works on the same neural architecture, the optimization is performed following the gradient flow on the population risk, where the average over data points is replaced by the expectation over their distribution, assumed to be Gaussian. We first derive convergence properties for the gradient flow and quantify the overparameterization that is necessary to achieve a strong signal recovery. Then, assuming that the teachers and the students at initialization form independent orthonormal families, we derive a high-dimensional limit for the flow and show that the minimal overparameterization is sufficient for strong recovery. We verify by numerical experiments that these results hold for more general initializations.} }
Endnote
%0 Conference Paper %T On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions %A Simon Martin %A Francis Bach %A Giulio Biroli %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-martin24a %I PMLR %P 3655--3663 %U https://proceedings.mlr.press/v238/martin24a.html %V 238 %X We study the training dynamics of a shallow neural network with quadratic activation functions and quadratic cost in a teacher-student setup. In line with previous works on the same neural architecture, the optimization is performed following the gradient flow on the population risk, where the average over data points is replaced by the expectation over their distribution, assumed to be Gaussian. We first derive convergence properties for the gradient flow and quantify the overparameterization that is necessary to achieve a strong signal recovery. Then, assuming that the teachers and the students at initialization form independent orthonormal families, we derive a high-dimensional limit for the flow and show that the minimal overparameterization is sufficient for strong recovery. We verify by numerical experiments that these results hold for more general initializations.
APA
Martin, S., Bach, F. & Biroli, G.. (2024). On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3655-3663 Available from https://proceedings.mlr.press/v238/martin24a.html.

Related Material