Show simple item record

dc.contributor.authorClaude, Francisco 18:09:08 (GMT) 18:09:08 (GMT)
dc.description.abstractThe amount of data that people and companies store has grown exponentially over the last few years. Storing this information alone is not enough, because in order to make it useful we need to be able to efficiently search inside it. Furthermore, it is highly valuable to keep the historic data of each document stored, allowing to not only access and search inside the newest version, but also over the whole history of the documents. Grammar-based compression has proven to be very effective for repetitive data, which is the case for versioned documents. In this thesis we present several results on representing textual information and searching in it. In particular, we present text indexes for grammar-based compressed text that support searching for a pattern and extracting substrings of the input text. These are the first general indexes for grammar-based compressed text that support searching in sublinear time. In order to build our indexes, we present new results on representing binary relations in a space-efficient manner, and construction algorithms that use little space to achieve their goal. These two results have a wide range of applications. In particular, the representations for binary relations can be used as a building block for several structures in computer science, such as graphs, inverted indexes, etc. Finally, we present a new index, that uses on grammar-based compression, to solve the document listing problem. This problem deals with representing a collection of texts and searching for the documents that contain a given pattern. In spite of being similar to the classical text indexing problem, this problem has proven to be a challenge when we do not want to pay time proportional to the number of occurrences, but time proportional to the size of the result. Our proposal is designed particularly for versioned text, allowing the storage of a collection of documents with all their historic versions in little space. This is currently the smallest structure for such a purpose in practice.en
dc.publisherUniversity of Waterlooen
dc.subjectdata structuresen
dc.titleSpace-Efficient Data Structures for Information Retrievalen
dc.typeDoctoral Thesisen
dc.subject.programComputer Scienceen of Computer Scienceen
uws-etd.degreeDoctor of Philosophyen

Files in this item


This item appears in the following Collection(s)

Show simple item record


University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages