UWSpace is currently experiencing technical difficulties resulting from its recent migration to a new version of its software. These technical issues are not affecting the submission and browse features of the site. UWaterloo community members may continue submitting items to UWSpace. We apologize for the inconvenience, and are actively working to resolve these technical issues.
 

Approach to Evaluating Clustering Using Classification Labelled Data

dc.contributor.authorLuu, Tuong
dc.date.accessioned2011-01-17T20:28:01Z
dc.date.available2011-01-17T20:28:01Z
dc.date.issued2011-01-17T20:28:01Z
dc.date.submitted2010
dc.description.abstractCluster analysis has been identified as a core task in data mining for which many different algorithms have been proposed. The diversity, on one hand, provides us a wide collection of tools. On the other hand, the profusion of options easily causes confusion. Given a particular task, users do not know which algorithm is good since it is not clear how clustering algorithms should be evaluated. As a consequence, users often select clustering algorithm in a very adhoc manner. A major challenge in evaluating clustering algorithms is the scarcity of real data with a "correct" ground truth clustering. This is in stark contrast to the situation for classification tasks, where there are abundantly many data sets labeled with their correct classifications. As a result, clustering research often relies on labeled data to evaluate and compare the results of clustering algorithms. We present a new perspective on how to use labeled data for evaluating clustering algorithms, and develop an approach for comparing clustering algorithms on the basis of classification labeled data. We then use this approach to support a novel technique for choosing among clustering algorithms when no labels are available. We use these tools to demonstrate that the utility of an algorithm depends on the specific clustering task. Investigating a set of common clustering algorithms, we demonstrate that there are cases where each one of them outputs better clusterings. In contrast to the current trend of looking for a superior clustering algorithm, our findings demonstrate the need for a variety of different clustering algorithms.en
dc.identifier.urihttp://hdl.handle.net/10012/5720
dc.language.isoenen
dc.pendingfalseen
dc.publisherUniversity of Waterlooen
dc.subjectclusteringen
dc.subjectempirical studyen
dc.subject.programComputer Scienceen
dc.titleApproach to Evaluating Clustering Using Classification Labelled Dataen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentSchool of Computer Scienceen
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Luu_Tuong.pdf
Size:
927.92 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
243 B
Format:
Item-specific license agreed upon to submission
Description: