Simple and Scalable Sparse k-means Clustering via Feature Ranking

Zhang, Zhiyue; Lange, Kenneth; Xu, Jason

Statistics > Machine Learning

arXiv:2002.08541 (stat)

[Submitted on 20 Feb 2020 (v1), last revised 22 Oct 2020 (this version, v2)]

Title:Simple and Scalable Sparse k-means Clustering via Feature Ranking

Authors:Zhiyue Zhang, Kenneth Lange, Jason Xu

View PDF

Abstract:Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters. This has motivated the development of sparse clustering techniques that typically rely on k-means within outer algorithms of high computational complexity. Current techniques also require careful tuning of shrinkage parameters, further limiting their scalability. In this paper, we propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms. We show that our algorithm enjoys consistency and convergence guarantees. Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings. We showcase these contributions thoroughly via simulated experiments and real data benchmarks, including a case study on protein expression in trisomic mice.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2002.08541 [stat.ML]
	(or arXiv:2002.08541v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2002.08541

Submission history

From: Zhiyue Zhang [view email]
[v1] Thu, 20 Feb 2020 02:41:02 UTC (44 KB)
[v2] Thu, 22 Oct 2020 11:28:41 UTC (54 KB)

Statistics > Machine Learning

Title:Simple and Scalable Sparse k-means Clustering via Feature Ranking

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Simple and Scalable Sparse k-means Clustering via Feature Ranking

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators