Scalable Nearest Neighbor Search with Compact Codes

dc.contributor.authorEghbali, Sepehr
dc.date.accessioned2019-12-19T21:27:17Z
dc.date.available2019-12-19T21:27:17Z
dc.date.issued2019-12-19
dc.date.submitted2019-11-25
dc.description.abstractAn important characteristic of the recent decade is the dramatic growth in the use and generation of data. From collections of images, documents and videos, to genetic data, and to network traffic statistics, modern technologies and cheap storage have made it possible to accumulate huge datasets. But how can we effectively use all this data? The growing sizes of the modern datasets make it crucial to develop new algorithms and tools capable of sifting through this data efficiently. A central computational primitive for analyzing large datasets is the Nearest Neighbor Search problem in which the goal is to preprocess a set of objects, so that later, given a query object, one can find the data object closest to the query. In most situations involving high-dimensional objects, the exhaustive search which compares the query with every item in the dataset has a prohibitive cost both for runtime and memory space. This thesis focuses on the design of algorithms and tools for fast and cost efficient nearest neighbor search. The proposed techniques advocate the use of compressed and discrete codes for representing the neighborhood structure of data in a compact way. Transforming high-dimensional items, such as raw images, into similarity-preserving compact codes has both computational and storage advantages as compact codes can be stored efficiently using only a few bits per data item, and more importantly they can be compared extremely fast using bit-wise or look-up table operators. Motivated by this view, the present work explores two main research directions: 1) finding mappings that better preserve the given notion of similarity while keeping the codes as compressed as possible, and 2) building efficient data structures that support non-exhaustive search among the compact codes. Our large-scale experimental results reported on various benchmarks including datasets upto one billion items, show boost in retrieval performance in comparison to the state-of-the-art.en
dc.identifier.urihttp://hdl.handle.net/10012/15355
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectmachine learningen
dc.subjectcomputer visionen
dc.subjectinformation retrievalen
dc.subjectnearest neighbor searchen
dc.subjectcompressionen
dc.titleScalable Nearest Neighbor Search with Compact Codesen
dc.typeDoctoral Thesisen
uws-etd.degreeDoctor of Philosophyen
uws-etd.degree.departmentElectrical and Computer Engineeringen
uws-etd.degree.disciplineElectrical and Computer Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws.contributor.advisorTahvildari, Ladan
uws.contributor.affiliation1Faculty of Engineeringen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Eghbali_Sepehr.pdf
Size:
2.09 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.08 KB
Format:
Item-specific license agreed upon to submission
Description: