Revisiting Deep Learning Models for Tabular Data

Gorishniy, Yury; Rubachev, Ivan; Khrulkov, Valentin; Babenko, Artem

Computer Science > Machine Learning

arXiv:2106.11959v1 (cs)

[Submitted on 22 Jun 2021 (this version), latest version 26 Oct 2023 (v5)]

Title:Revisiting Deep Learning Models for Tabular Data

Authors:Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, Artem Babenko

View PDF

Abstract:The necessity of deep learning for tabular data is still an unanswered question addressed by a large number of research efforts. The recent literature on tabular DL proposes several deep architectures reported to be superior to traditional "shallow" models like Gradient Boosted Decision Trees. However, since existing works often use different benchmarks and tuning protocols, it is unclear if the proposed models universally outperform GBDT. Moreover, the models are often not compared to each other, therefore, it is challenging to identify the best deep model for practitioners.
In this work, we start from a thorough review of the main families of DL models recently developed for tabular data. We carefully tune and evaluate them on a wide range of datasets and reveal two significant findings. First, we show that the choice between GBDT and DL models highly depends on data and there is still no universally superior solution. Second, we demonstrate that a simple ResNet-like architecture is a surprisingly effective baseline, which outperforms most of the sophisticated models from the DL literature. Finally, we design a simple adaptation of the Transformer architecture for tabular data that becomes a new strong DL baseline and reduces the gap between GBDT and DL models on datasets where GBDT dominates.

Comments:	Code: this https URL
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2106.11959 [cs.LG]
	(or arXiv:2106.11959v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2106.11959

Submission history

From: Yury Gorishniy [view email]
[v1] Tue, 22 Jun 2021 17:58:10 UTC (1,135 KB)
[v2] Wed, 10 Nov 2021 18:52:23 UTC (2,309 KB)
[v3] Wed, 26 Jul 2023 15:57:25 UTC (1,158 KB)
[v4] Wed, 25 Oct 2023 17:59:45 UTC (1,158 KB)
[v5] Thu, 26 Oct 2023 12:00:03 UTC (1,158 KB)

Computer Science > Machine Learning

Title:Revisiting Deep Learning Models for Tabular Data

Submission history

Access Paper:

References & Citations

2 blog links

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Revisiting Deep Learning Models for Tabular Data

Submission history

Access Paper:

References & Citations

2 blog links

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators