The Libraries will be performing system maintenance to UWSpace on Thursday, March 13th from 12:30 to 5:30 pm (EDT). UWSpace will be unavailable during this time.
 

Evaluating Clusterings by Estimating Clarity

dc.comment.hiddenNo permission is required for the work in this thesis that has been already published.en
dc.contributor.authorWhissell, John
dc.date.accessioned2012-10-12T20:08:05Z
dc.date.available2012-10-12T20:08:05Z
dc.date.issued2012-10-12T20:08:05Z
dc.date.submitted2012
dc.description.abstractIn this thesis I examine clustering evaluation, with a subfocus on text clusterings specifically. The principal work of this thesis is the development, analysis, and testing of a new internal clustering quality measure called informativeness. I begin by reviewing clustering in general. I then review current clustering quality measures, accompanying this with an in-depth discussion of many of the important properties one needs to understand about such measures. This is followed by extensive document clustering experiments that show problems with standard clustering evaluation practices. I then develop informativeness, my new internal clustering quality measure for estimating the clarity of clusterings. I show that informativeness, which uses classification accuracy as a proxy for human assessment of clusterings, is both theoretically sensible and works empirically. I present a generalization of informativeness that leverages external clustering quality measures. I also show its use in a realistic application: email spam filtering. I show that informativeness can be used to select clusterings which lead to superior spam filters when few true labels are available. I conclude this thesis with a discussion of clustering evaluation in general, informativeness, and the directions I believe clustering evaluation research should take in the future.en
dc.identifier.urihttp://hdl.handle.net/10012/7103
dc.language.isoenen
dc.pendingfalseen
dc.publisherUniversity of Waterlooen
dc.subjectclusteringen
dc.subjectevaluating clusteringen
dc.subjectcluster validationen
dc.subjectcluster analysisen
dc.subject.programComputer Scienceen
dc.titleEvaluating Clusterings by Estimating Clarityen
dc.typeDoctoral Thesisen
uws-etd.degreeDoctor of Philosophyen
uws-etd.degree.departmentSchool of Computer Scienceen
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Whissell_John.pdf
Size:
1.7 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
253 B
Format:
Item-specific license agreed upon to submission
Description: