Improving generalisation of AutoML systems with dynamic fitness evaluations

Evans, Benjamin Patrick; Xue, Bing; Zhang, Mengjie

doi:10.1145/3377930.3389805

Computer Science > Machine Learning

arXiv:2001.08842 (cs)

[Submitted on 23 Jan 2020]

Title:Improving generalisation of AutoML systems with dynamic fitness evaluations

Authors:Benjamin Patrick Evans, Bing Xue, Mengjie Zhang

View PDF

Abstract:A common problem machine learning developers are faced with is overfitting, that is, fitting a pipeline too closely to the training data that the performance degrades for unseen data. Automated machine learning aims to free (or at least ease) the developer from the burden of pipeline creation, but this overfitting problem can persist. In fact, this can become more of a problem as we look to iteratively optimise the performance of an internal cross-validation (most often \textit{k}-fold). While this internal cross-validation hopes to reduce this overfitting, we show we can still risk overfitting to the particular folds used. In this work, we aim to remedy this problem by introducing dynamic fitness evaluations which approximate repeated \textit{k}-fold cross-validation, at little extra cost over single \textit{k}-fold, and far lower cost than typical repeated \textit{k}-fold. The results show that when time equated, the proposed fitness function results in significant improvement over the current state-of-the-art baseline method which uses an internal single \textit{k}-fold. Furthermore, the proposed extension is very simple to implement on top of existing evolutionary computation methods, and can provide essentially a free boost in generalisation/testing performance.

Comments:	19 pages, 4 figures
Subjects:	Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Cite as:	arXiv:2001.08842 [cs.LG]
	(or arXiv:2001.08842v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2001.08842
Related DOI:	https://doi.org/10.1145/3377930.3389805

Submission history

From: Benjamin Evans [view email]
[v1] Thu, 23 Jan 2020 22:54:54 UTC (925 KB)

Computer Science > Machine Learning

Title:Improving generalisation of AutoML systems with dynamic fitness evaluations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Improving generalisation of AutoML systems with dynamic fitness evaluations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators