FitNets: Hints for Thin Deep Nets

Romero, Adriana; Ballas, Nicolas; Kahou, Samira Ebrahimi; Chassang, Antoine; Gatta, Carlo; Bengio, Yoshua

Computer Science > Machine Learning

arXiv:1412.6550 (cs)

[Submitted on 19 Dec 2014 (v1), last revised 27 Mar 2015 (this version, v4)]

Title:FitNets: Hints for Thin Deep Nets

Authors:Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio

View PDF

Abstract:While depth tends to improve network performances, it also makes gradient-based training more difficult since deeper networks tend to be more non-linear. The recently proposed knowledge distillation approach is aimed at obtaining small and fast-to-execute models, and it has shown that a student network could imitate the soft output of a larger teacher network or ensemble of networks. In this paper, we extend this idea to allow the training of a student that is deeper and thinner than the teacher, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student. Because the student intermediate hidden layer will generally be smaller than the teacher's intermediate hidden layer, additional parameters are introduced to map the student hidden layer to the prediction of the teacher hidden layer. This allows one to train deeper students that can generalize better or run faster, a trade-off that is controlled by the chosen student capacity. For example, on CIFAR-10, a deep student network with almost 10.4 times less parameters outperforms a larger, state-of-the-art teacher network.

Subjects:	Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:1412.6550 [cs.LG]
	(or arXiv:1412.6550v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1412.6550

Submission history

From: Adriana Romero [view email]
[v1] Fri, 19 Dec 2014 22:40:51 UTC (124 KB)
[v2] Fri, 9 Jan 2015 20:56:15 UTC (124 KB)
[v3] Fri, 27 Feb 2015 18:44:36 UTC (126 KB)
[v4] Fri, 27 Mar 2015 11:52:28 UTC (132 KB)

Computer Science > Machine Learning

Title:FitNets: Hints for Thin Deep Nets

Submission history

Access Paper:

References & Citations

2 blog links

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:FitNets: Hints for Thin Deep Nets

Submission history

Access Paper:

References & Citations

2 blog links

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators