Multi-User File System Search

Loading...
Thumbnail Image

Date

2007-08-03T15:32:48Z

Authors

Buettcher, Stefan

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Information retrieval research usually deals with globally visible, static document collections. Practical applications, in contrast, like file system search and enterprise search, have to cope with highly dynamic text collections and have to take into account user-specific access permissions when generating the results to a search query. The goal of this thesis is to close the gap between information retrieval research and the requirements exacted by these real-life applications. The algorithms and data structures presented in this thesis can be used to implement a file system search engine that is able to react to changes in the file system by updating its index data in real time. File changes (insertions, deletions, or modifications) are reflected by the search results within a few seconds, even under a very high system workload. The search engine exhibits a low main memory consumption. By integrating security restrictions into the query processing logic, as opposed to applying them in a postprocessing step, it produces search results that are guaranteed to be consistent with the access permissions defined by the file system. The techniques proposed in this thesis are evaluated theoretically, based on a Zipfian model of term distribution, and through a large number of experiments, involving text collections of non-trivial size --- varying between a few gigabytes and a few hundred gigabytes.

Description

Keywords

information retrieval, security, performance, index maintenance

LC Keywords

Citation