Efficient and Flexible Search in Large Scale Distributed Systems
Loading...
Date
2007-05-18T15:58:03Z
Authors
Ahmed, Reaz
Advisor
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Peer-to-peer (P2P) technology has triggered a wide range of
distributed systems beyond simple file-sharing. Distributed XML
databases, distributed computing, server-less web publishing and
networked resource/service sharing are only a few to name. Despite
of the diversity in applications, these systems share a common
problem regarding searching and discovery of information. This
commonality stems from the transitory nodes population and
volatile information content in the participating nodes. In such
dynamic environment, users are not expected to have the exact
information about the available objects in the system. Rather
queries are based on partial information, which requires the
search mechanism to be flexible. On the other hand, to scale with
network size the search mechanism is required to be bandwidth
efficient.
Since the advent of P2P technology experts from industry and
academia have proposed a number of search techniques - none of
which is able to provide satisfactory solution to the conflicting
requirements of search efficiency and flexibility. Structured
search techniques, mostly Distributed Hash Table (DHT)-based, are
bandwidth efficient while semi(un)-structured techniques are
flexible. But, neither achieves both ends.
This thesis defines the Distributed Pattern Matching (DPM)
problem. The DPM problem is to discover a pattern (\ie bit-vector)
using any subset of its 1-bits, under the assumption that the
patterns are distributed across a large population of networked
nodes. Search problem in many distributed systems can be reduced
to the DPM problem.
This thesis also presents two distinct search mechanisms, named
Distributed Pattern Matching System (DPMS) and Plexus, for solving
the DPM problem. DPMS is a semi-structured, hierarchical
architecture aiming to discover a predefined number of matches by
visiting a small number of nodes. Plexus, on the other hand, is a
structured search mechanism based on the theory of Error
Correcting Code (ECC). The design goal behind Plexus is to
discover all the matches by visiting a reasonable number of nodes.
Description
Keywords
Distributed Pattern Matching, DPM, DPMS, Plexus