On the interplay of network structure and gradient convergence in deep learning

Ithapu, Vamsi K; Ravi, Sathya N; Singh, Vikas

Computer Science > Machine Learning

arXiv:1511.05297 (cs)

[Submitted on 17 Nov 2015 (v1), last revised 22 Feb 2017 (this version, v8)]

Title:On the interplay of network structure and gradient convergence in deep learning

Authors:Vamsi K Ithapu, Sathya N Ravi, Vikas Singh

View PDF

Abstract:The regularization and output consistency behavior of dropout and layer-wise pretraining for learning deep networks have been fairly well studied. However, our understanding of how the asymptotic convergence of backpropagation in deep architectures is related to the structural properties of the network and other design choices (like denoising and dropout rate) is less clear at this time. An interesting question one may ask is whether the network architecture and input data statistics may guide the choices of learning parameters and vice versa. In this work, we explore the association between such structural, distributional and learnability aspects vis-à-vis their interaction with parameter convergence rates. We present a framework to address these questions based on convergence of backpropagation for general nonconvex objectives using first-order information. This analysis suggests an interesting relationship between feature denoising and dropout. Building upon these results, we obtain a setup that provides systematic guidance regarding the choice of learning parameters and network sizes that achieve a certain level of convergence (in the optimization sense) often mediated by statistical attributes of the inputs. Our results are supported by a set of experimental evaluations as well as independent empirical observations reported by other groups.

Comments:	54th Allerton Conference on Communication, Control and Computing 2016; pgs 488-495
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1511.05297 [cs.LG]
	(or arXiv:1511.05297v8 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1511.05297

Submission history

From: Vamsi Ithapu [view email]
[v1] Tue, 17 Nov 2015 07:31:56 UTC (181 KB)
[v2] Thu, 19 Nov 2015 21:49:44 UTC (446 KB)
[v3] Thu, 7 Jan 2016 21:47:36 UTC (1,244 KB)
[v4] Mon, 18 Jan 2016 20:17:03 UTC (1,248 KB)
[v5] Tue, 29 Mar 2016 23:16:43 UTC (1,257 KB)
[v6] Mon, 3 Oct 2016 16:21:39 UTC (218 KB)
[v7] Tue, 4 Oct 2016 20:56:42 UTC (218 KB)
[v8] Wed, 22 Feb 2017 17:28:01 UTC (252 KB)

Computer Science > Machine Learning

Title:On the interplay of network structure and gradient convergence in deep learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On the interplay of network structure and gradient convergence in deep learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators