Show simple item record

dc.contributor.authorHaghtalab, Nika
dc.date.accessioned2013-08-26 14:49:01 (GMT)
dc.date.available2013-08-26 14:49:01 (GMT)
dc.date.issued2013-08-26T14:49:01Z
dc.date.submitted2013-08-08
dc.identifier.urihttp://hdl.handle.net/10012/7742
dc.description.abstractClustering, which is partitioning data into groups of similar objects, has a wide range of applications. In many cases unstructured data makes up a significant part of the input. Attempting to cluster such part of the data, which can be referred to as noise, can disturb the clustering on the remaining domain points. Despite the practical need for a framework of clustering that allows a portion of the data to remain unclustered, little research has been done so far in that direction. In this thesis, we take a step towards addressing the issue of clustering in the presence of noise in two parts. First, we develop a platform for clustering that has a cluster devoted to the "noise" points. Second, we examine the problem of "robustness" of clustering algorithms to the addition of noise. In the first part, we develop a formal framework for clustering that has a designated noise cluster. We formalize intuitively desirable input-output properties of clustering algorithms that have a noise cluster. We review some previously known algorithms, introduce new algorithms for this setting, and examine them with respect to the introduced properties. In the second part, we address the problem of robustness of clustering algorithms to the addition of unstructured data. We propose a simple and efficient method to turn any centroid-based clustering algorithm into a noise robust one that has a noise cluster. We discuss several rigorous measures of robustness and prove performance guarantees for our method with respect to these measures under the assumption that the noise-free data satisfies some niceness properties and the noise satisfies some mildness properties. We also prove that more straightforward ways of adding robustness to clustering algorithms fail to achieve the above mentioned guarantees.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectClusteringen
dc.titleClustering in the Presence of Noiseen
dc.typeMaster Thesisen
dc.pendingfalseen
dc.subject.programComputer Scienceen
uws-etd.degree.departmentSchool of Computer Scienceen
uws-etd.degreeMaster of Mathematicsen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages