Upper and Lower Bounds for Text Upper and Lower Bounds for Text Indexing Data Structures

dc.comment.hiddenI am a bit short on time for making the final submission by Jan 25 (The reason is that I took a new job in the States just after the revisions for my thesis were accepted and was quite busy with moving and first weeks on the job). Would it be possible to review it before Jan 25?en
dc.contributor.authorGolynski, Alexander
dc.date.accessioned2008-01-23T14:37:36Z
dc.date.available2008-01-23T14:37:36Z
dc.date.issued2008-01-23T14:37:36Z
dc.date.submitted2007-12-10
dc.description.abstractThe main goal of this thesis is to investigate the complexity of a variety of problems related to text indexing and text searching. We present new data structures that can be used as building blocks for full-text indices which occupies minute space (FM-indexes) and wavelet trees. These data structures also can be used to represent labeled trees and posting lists. Labeled trees are applied in XML documents, and posting lists in search engines. The main emphasis of this thesis is on lower bounds for time-space tradeoffs for the following problems: the rank/select problem, the problem of representing a string of balanced parentheses, the text retrieval problem, the problem of computing a permutation and its inverse, and the problem of representing a binary relation. These results are divided in two groups: lower bounds in the cell probe model and lower bounds in the indexing model. The cell probe model is the most natural and widely accepted framework for studying data structures. In this model, we are concerned with the total space used by a data structure and the total number of accesses (probes) it performs to memory, while computation is free of charge. The indexing model imposes an additional restriction on the storage: the object in question must be stored in its raw form together with a small index that facilitates an efficient implementation of a given set of queries, e.g. finding rank, select, matching parenthesis, or an occurrence of a given pattern in a given text (for the text retrieval problem). We propose a new technique for proving lower bounds in the indexing model and use it to obtain lower bounds for the rank/select problem and the balanced parentheses problem. We also improve the existing techniques of Demaine and Lopez-Ortiz using compression and present stronger lower bounds for the text retrieval problem in the indexing model. The most important result of this thesis is a new technique for cell probe lower bounds. We demonstrate its strength by proving new lower bounds for the problem of representing permutations, the text retrieval problem, and the problem of representing binary relations. (Previously, there were no non-trivial results known for these problems.) In addition, we note that the lower bounds for the permutations problem and the binary relations problem are tight for a wide range of parameters, e.g. the running time of queries, the size and density of the relation.en
dc.identifier.urihttp://hdl.handle.net/10012/3509
dc.language.isoenen
dc.pendingfalseen
dc.publisherUniversity of Waterlooen
dc.subjecttheoretical computer scienceen
dc.subjectdata structuresen
dc.subjectlower boundsen
dc.subjecttext indexingen
dc.subjectrank/select problemen
dc.subjectrepresentation of permutationsen
dc.subject.programComputer Scienceen
dc.titleUpper and Lower Bounds for Text Upper and Lower Bounds for Text Indexing Data Structuresen
dc.typeDoctoral Thesisen
uws-etd.degreeDoctor of Philosophyen
uws-etd.degree.departmentSchool of Computer Scienceen
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis.pdf
Size:
963.01 KB
Format:
Adobe Portable Document Format
Description:
dissertation in PDF

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
255 B
Format:
Item-specific license agreed upon to submission
Description: