New Algorithmic Tools for Distributed Similarity Search and Edge Estimation
Loading...
Date
Authors
Rashtchian, Cyrus
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
We present several foundational results on computational questions related to similarity search, clustering, and parameter estimation. The problems center around the theme of improving algorithms by utilizing geometric or graphical structure. Some contributions include: - Improved upper and lower bounds for computing a similarity join under Hamming distance in a simultaneous distributed model. The core of our analysis involves novel connections between similarity joins and extremal graph theory. - An edge-isoperimetric inequality for powers of the binary hypercube. The insights here help us to develop new similarity join algorithms that are nearly-optimal for a theoretical MapReduce model. - A distributed clustering algorithm for edit distance, with applications to DNA data storage. By using random structure found in real datasets, we achieve new hashing, embedding, and convergence results for an otherwise challenging clustering problem. - The first polylogarithmic query algorithm for estimating the number of edges in a graph using a natural graph query. Our randomized, adaptive algorithm uses bipartite independent set queries to quickly learn an unknown graph.
Description
Thesis (Ph.D.)--University of Washington, 2018
Keywords
Clustering, DNA Data Storage, Edge-Isoperimetric, Independent Set, MapReduce, Similarity Search, Computer science, Mathematics