Cache-Oblivious Searching and Sorting in Multisets

dc.contributor.authorFarzan, Arashen
dc.date.accessioned2006-08-22T14:25:20Z
dc.date.available2006-08-22T14:25:20Z
dc.date.issued2004en
dc.date.submitted2004en
dc.description.abstractWe study three problems related to searching and sorting in multisets in the cache-oblivious model: Finding the most frequent element (the mode), duplicate elimination and finally multi-sorting. We are interested in minimizing the cache complexity (or number of cache misses) of algorithms for these problems in the context under which the cache size and block size are unknown. We start by showing the lower bounds in the comparison model. Then we present the lower bounds in the cache-aware model, which are also the lower bounds in the cache-oblivious model. We consider the input multiset of size <i>N</i> with multiplicities <i>N</i><sub>1</sub>,. . . , <i>N<sub>k</sub></i>. The lower bound for the cache complexity of determining the mode is &Omega;({<i>N</i> over <i>B</i>} log {<i>M</i> over <i>B</i>} {<i>N</i> over <i>fB</i>}) where &fnof; is the frequency of the mode and <i>M</i>, <i>B</i> are the cache size and block size respectively. Cache complexities of duplicate removal and multi-sorting have lower bounds of &Omega;({<i>N</i> over <i>B</i>} log {<i>M</i> over <i>B</i>} {<i>N</i> over <i>B</i>} - £{<i>k</i> over <i>i</i>}=1{<i>N<sub>i</sub></i> over <i>B</i>}log {<i>M</i> over <i>B</i>} {<i>N<sub>i</sub></i> over <i>B</i>}). We present two deterministic approaches to give algorithms: selection and distribution. The algorithms with these deterministic approaches differ from the lower bounds by at most an additive term of {<i>N</i> over <i>B</i>} loglog <i>M</i>. However, since loglog <i>M</i> is very small in real applications, the gap is tiny. Nevertheless, the ideas of our deterministic algorithms can be used to design cache-aware algorithms for these problems. The algorithms turn out to be simpler than the previously-known cache-aware algorithms for these problems. Another approach to design algorithms for these problems is the probabilistic approach. In contrast to the deterministic algorithms, our randomized cache-oblivious algorithms are all optimal and their cache complexities exactly match the lower bounds. All of our algorithms are within a constant factor of optimal in terms of the number of comparisons they perform.en
dc.formatapplication/pdfen
dc.format.extent327044 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10012/1019
dc.language.isoenen
dc.pendingfalseen
dc.publisherUniversity of Waterlooen
dc.rightsCopyright: 2004, Farzan, Arash. All rights reserved.en
dc.subjectComputer Scienceen
dc.subjectMemory Hierarchiesen
dc.subjectCache-Oblivious modelen
dc.subjectMultisetsen
dc.subjectDetermining the modeen
dc.subjectDuplicate Eliminationen
dc.subjectSortingen
dc.titleCache-Oblivious Searching and Sorting in Multisetsen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentSchool of Computer Scienceen
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
afarzan2004.pdf
Size:
319.38 KB
Format:
Adobe Portable Document Format