Repository logo
 

New Algorithmic Tools for Distributed Similarity Search and Edge Estimation

Loading...
Thumbnail Image

Date

Authors

Rashtchian, Cyrus

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

We present several foundational results on computational questions related to similarity search, clustering, and parameter estimation. The problems center around the theme of improving algorithms by utilizing geometric or graphical structure. Some contributions include: - Improved upper and lower bounds for computing a similarity join under Hamming distance in a simultaneous distributed model. The core of our analysis involves novel connections between similarity joins and extremal graph theory. - An edge-isoperimetric inequality for powers of the binary hypercube. The insights here help us to develop new similarity join algorithms that are nearly-optimal for a theoretical MapReduce model. - A distributed clustering algorithm for edit distance, with applications to DNA data storage. By using random structure found in real datasets, we achieve new hashing, embedding, and convergence results for an otherwise challenging clustering problem. - The first polylogarithmic query algorithm for estimating the number of edges in a graph using a natural graph query. Our randomized, adaptive algorithm uses bipartite independent set queries to quickly learn an unknown graph.

Description

Thesis (Ph.D.)--University of Washington, 2018

Keywords

Clustering, DNA Data Storage, Edge-Isoperimetric, Independent Set, MapReduce, Similarity Search, Computer science, Mathematics

Citation

DOI