Upper and Lower Bounds for Text
Upper and Lower Bounds for Text Indexing Data Structures

Golynski, Alexander

Upper and Lower Bounds for Text Upper and Lower Bounds for Text Indexing Data Structures

dc.comment.hidden	I am a bit short on time for making the final submission by Jan 25 (The reason is that I took a new job in the States just after the revisions for my thesis were accepted and was quite busy with moving and first weeks on the job). Would it be possible to review it before Jan 25?	en
dc.contributor.author	Golynski, Alexander
dc.date.accessioned	2008-01-23T14:37:36Z
dc.date.available	2008-01-23T14:37:36Z
dc.date.issued	2008-01-23T14:37:36Z
dc.date.submitted	2007-12-10
dc.description.abstract	The main goal of this thesis is to investigate the complexity of a variety of problems related to text indexing and text searching. We present new data structures that can be used as building blocks for full-text indices which occupies minute space (FM-indexes) and wavelet trees. These data structures also can be used to represent labeled trees and posting lists. Labeled trees are applied in XML documents, and posting lists in search engines. The main emphasis of this thesis is on lower bounds for time-space tradeoffs for the following problems: the rank/select problem, the problem of representing a string of balanced parentheses, the text retrieval problem, the problem of computing a permutation and its inverse, and the problem of representing a binary relation. These results are divided in two groups: lower bounds in the cell probe model and lower bounds in the indexing model. The cell probe model is the most natural and widely accepted framework for studying data structures. In this model, we are concerned with the total space used by a data structure and the total number of accesses (probes) it performs to memory, while computation is free of charge. The indexing model imposes an additional restriction on the storage: the object in question must be stored in its raw form together with a small index that facilitates an efficient implementation of a given set of queries, e.g. finding rank, select, matching parenthesis, or an occurrence of a given pattern in a given text (for the text retrieval problem). We propose a new technique for proving lower bounds in the indexing model and use it to obtain lower bounds for the rank/select problem and the balanced parentheses problem. We also improve the existing techniques of Demaine and Lopez-Ortiz using compression and present stronger lower bounds for the text retrieval problem in the indexing model. The most important result of this thesis is a new technique for cell probe lower bounds. We demonstrate its strength by proving new lower bounds for the problem of representing permutations, the text retrieval problem, and the problem of representing binary relations. (Previously, there were no non-trivial results known for these problems.) In addition, we note that the lower bounds for the permutations problem and the binary relations problem are tight for a wide range of parameters, e.g. the running time of queries, the size and density of the relation.	en
dc.identifier.uri	http://hdl.handle.net/10012/3509
dc.language.iso	en	en
dc.pending	false	en
dc.publisher	University of Waterloo	en
dc.subject	theoretical computer science	en
dc.subject	data structures	en
dc.subject	lower bounds	en
dc.subject	text indexing	en
dc.subject	rank/select problem	en
dc.subject	representation of permutations	en
dc.subject.program	Computer Science	en
dc.title	Upper and Lower Bounds for Text Upper and Lower Bounds for Text Indexing Data Structures	en
dc.type	Doctoral Thesis	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.degree.department	School of Computer Science	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: thesis.pdf
Size:: 963.01 KB
Format:: Adobe Portable Document Format
Description:: dissertation in PDF

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 255 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science