Oversampling Divide-and-conquer for Response-skewed Kernel Ridge Regression

Zhang, Jingyi; Sun, Xiaoxiao

Statistics > Machine Learning

arXiv:2107.05834 (stat)

[Submitted on 13 Jul 2021 (v1), last revised 10 Nov 2021 (this version, v2)]

Title:Oversampling Divide-and-conquer for Response-skewed Kernel Ridge Regression

Authors:Jingyi Zhang, Xiaoxiao Sun

View PDF

Abstract:The divide-and-conquer method has been widely used for estimating large-scale kernel ridge regression estimates. Unfortunately, when the response variable is highly skewed, the divide-and-conquer kernel ridge regression (dacKRR) may overlook the underrepresented region and result in unacceptable results. We combine a novel response-adaptive partition strategy with the oversampling technique synergistically to overcome the limitation. Through the proposed novel algorithm, we allocate some carefully identified informative observations to multiple nodes (local processors). Although the oversampling technique has been widely used for addressing discrete label skewness, extending it to the dacKRR setting is nontrivial. We provide both theoretical and practical guidance on how to effectively over-sample the observations under the dacKRR setting. Furthermore, we show the proposed estimate has a smaller risk than that of the classical dacKRR estimate under mild conditions. Our theoretical findings are supported by both simulated and real-data analyses.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2107.05834 [stat.ML]
	(or arXiv:2107.05834v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2107.05834

Submission history

From: Jingyi Zhang [view email]
[v1] Tue, 13 Jul 2021 04:01:04 UTC (771 KB)
[v2] Wed, 10 Nov 2021 05:12:48 UTC (2,881 KB)

Statistics > Machine Learning

Title:Oversampling Divide-and-conquer for Response-skewed Kernel Ridge Regression

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Oversampling Divide-and-conquer for Response-skewed Kernel Ridge Regression

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators