Improved Coresets for Euclidean $k$-Means

Cohen-Addad, Vincent; Larsen, Kasper Green; Saulpic, David; Schwiegelshohn, Chris; Sheikh-Omar, Omar Ali

Computer Science > Computational Geometry

arXiv:2211.08184 (cs)

[Submitted on 15 Nov 2022 (v1), last revised 16 Nov 2022 (this version, v2)]

Title:Improved Coresets for Euclidean $k$-Means

Authors:Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn, Omar Ali Sheikh-Omar

View PDF

Abstract:Given a set of $n$ points in $d$ dimensions, the Euclidean $k$-means problem (resp. the Euclidean $k$-median problem) consists of finding $k$ centers such that the sum of squared distances (resp. sum of distances) from every point to its closest center is minimized. The arguably most popular way of dealing with this problem in the big data setting is to first compress the data by computing a weighted subset known as a coreset and then run any algorithm on this subset. The guarantee of the coreset is that for any candidate solution, the ratio between coreset cost and the cost of the original instance is less than a $(1\pm \varepsilon)$ factor. The current state of the art coreset size is $\tilde O(\min(k^{2} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-4}))$ for Euclidean $k$-means and $\tilde O(\min(k^{2} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-3}))$ for Euclidean $k$-median. The best known lower bound for both problems is $\Omega(k \varepsilon^{-2})$. In this paper, we improve the upper bounds $\tilde O(\min(k^{3/2} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-4}))$ for $k$-means and $\tilde O(\min(k^{4/3} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-3}))$ for $k$-median. In particular, ours is the first provable bound that breaks through the $k^2$ barrier while retaining an optimal dependency on $\varepsilon$.

Subjects:	Computational Geometry (cs.CG); Machine Learning (cs.LG)
Cite as:	arXiv:2211.08184 [cs.CG]
	(or arXiv:2211.08184v2 [cs.CG] for this version)
	https://doi.org/10.48550/arXiv.2211.08184

Submission history

From: Chris Schwiegelshohn [view email]
[v1] Tue, 15 Nov 2022 14:47:24 UTC (113 KB)
[v2] Wed, 16 Nov 2022 06:42:40 UTC (113 KB)

Computer Science > Computational Geometry

Title:Improved Coresets for Euclidean $k$-Means

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computational Geometry

Title:Improved Coresets for Euclidean $k$-Means

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators