Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Bertsimas, Dimitris; Cory-Wright, Ryan; Pauphilet, Jean

Mathematics > Optimization and Control

arXiv:2005.05195v2 (math)

[Submitted on 11 May 2020 (v1), revised 21 Oct 2020 (this version, v2), latest version 25 Aug 2021 (v4)]

Title:Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Authors:Dimitris Bertsimas, Ryan Cory-Wright, Jean Pauphilet

View PDF

Abstract:Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than $p=100s$ covariates. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a cutting-plane method which solves the problem to certifiable optimality at the scale of selecting k=10s covariates from p=300 variables, and provides small bound gaps at a larger scale. We also propose two convex relaxations and randomized rounding schemes that provide certifiably near-exact solutions within minutes for p=100s or hours for p=1,000s. Using real-world financial and medical datasets, we illustrate our approach's ability to derive interpretable principal components tractably at scale.

Comments:	Submitted to JMLR
Subjects:	Optimization and Control (math.OC); Machine Learning (cs.LG); Statistics Theory (math.ST); Computation (stat.CO)
Cite as:	arXiv:2005.05195 [math.OC]
	(or arXiv:2005.05195v2 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.2005.05195

Submission history

From: Ryan Cory-Wright [view email]
[v1] Mon, 11 May 2020 15:39:23 UTC (55 KB)
[v2] Wed, 21 Oct 2020 21:11:28 UTC (65 KB)
[v3] Wed, 28 Apr 2021 21:04:39 UTC (109 KB)
[v4] Wed, 25 Aug 2021 15:42:09 UTC (110 KB)

Mathematics > Optimization and Control

Title:Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators