Google Scholar

Hierarchical clustering: New bounds and objective

M Rahgoshay, MR Salavatipour - arXiv preprint arXiv:2111.06863, 2021 - arxiv.org

arXiv preprint arXiv:2111.06863, 2021•arxiv.org

Hierarchical Clustering has been studied and used extensively as a method for analysis of data. More recently, Dasgupta [2016] defined a precise objective function. Given a set of data points with a weight function for each two items and denoting their similarity/dis-similarity, the goal is to build a recursive (tree like) partitioning of the data points (items) into successively smaller clusters. He defined a cost function for a tree to be where is the subtree rooted at the least common ancestor of and and presented the first approximation algorithm for such clustering. Then Moseley and Wang [2017] considered the dual of Dasgupta's objective function for similarity-based weights and showed that both random partitioning and average linkage have approximation ratio which has been improved in a series of works to [Alon et al. 2020]. Later Cohen-Addad et al. [2019] considered the same objective function as Dasgupta's but for dissimilarity-based metrics, called . It is shown that both random partitioning and average linkage have ratio which has been only slightly improved to [Charikar et al. SODA2020]. Our first main result is to consider and present a more delicate algorithm and careful analysis that achieves approximation . We also introduce a new objective function for dissimilarity-based clustering. For any tree , let be the number of and 's common ancestors. Intuitively, items that are similar are expected to remain within the same cluster as deep as possible. So, for dissimilarity-based metrics, we suggest the cost of each tree , which we want to minimize, to be . We present a -approximation for this objective.

arxiv.org

Show moreShow less

Save Cite Cited by 3 Related articles All 3 versions View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Hierarchical clustering: New bounds and objective