Efficient and Flexible Search in Large Scale Distributed Systems
MetadataShow full item record
Peer-to-peer (P2P) technology has triggered a wide range of distributed systems beyond simple file-sharing. Distributed XML databases, distributed computing, server-less web publishing and networked resource/service sharing are only a few to name. Despite of the diversity in applications, these systems share a common problem regarding searching and discovery of information. This commonality stems from the transitory nodes population and volatile information content in the participating nodes. In such dynamic environment, users are not expected to have the exact information about the available objects in the system. Rather queries are based on partial information, which requires the search mechanism to be flexible. On the other hand, to scale with network size the search mechanism is required to be bandwidth efficient. Since the advent of P2P technology experts from industry and academia have proposed a number of search techniques - none of which is able to provide satisfactory solution to the conflicting requirements of search efficiency and flexibility. Structured search techniques, mostly Distributed Hash Table (DHT)-based, are bandwidth efficient while semi(un)-structured techniques are flexible. But, neither achieves both ends. This thesis defines the Distributed Pattern Matching (DPM) problem. The DPM problem is to discover a pattern (\ie bit-vector) using any subset of its 1-bits, under the assumption that the patterns are distributed across a large population of networked nodes. Search problem in many distributed systems can be reduced to the DPM problem. This thesis also presents two distinct search mechanisms, named Distributed Pattern Matching System (DPMS) and Plexus, for solving the DPM problem. DPMS is a semi-structured, hierarchical architecture aiming to discover a predefined number of matches by visiting a small number of nodes. Plexus, on the other hand, is a structured search mechanism based on the theory of Error Correcting Code (ECC). The design goal behind Plexus is to discover all the matches by visiting a reasonable number of nodes.