UWSpace is currently experiencing technical difficulties resulting from its recent migration to a new version of its software. These technical issues are not affecting the submission and browse features of the site. UWaterloo community members may continue submitting items to UWSpace. We apologize for the inconvenience, and are actively working to resolve these technical issues.
 

A new framework for clustering

dc.contributor.authorZhou, Wu
dc.date.accessioned2010-04-07T13:18:26Z
dc.date.available2010-04-07T13:18:26Z
dc.date.issued2010-04-07T13:18:26Z
dc.date.submitted2010
dc.description.abstractThe difficulty of clustering and the variety of clustering methods suggest the need for a theoretical study of clustering. Using the idea of a standard statistical framework, we propose a new framework for clustering. For a well-defined clustering goal we assume that the data to be clustered come from an underlying distribution and we aim to find a high-density cluster tree. We regard this tree as a parameter of interest for the underlying distribution. However, it is not obvious how to determine a connected subset in a discrete distribution whose support is located in a Euclidean space. Building a cluster tree for such a distribution is an open problem and presents interesting conceptual and computational challenges. We solve this problem using graph-based approaches and further parameterize clustering using the high-density cluster tree and its extension. Motivated by the connection between clustering outcomes and graphs, we propose a graph family framework. This framework plays an important role in our clustering framework. A direct application of the graph family framework is a new cluster-tree distance measure. This distance measure can be written as an inner product or kernel. It makes our clustering framework able to perform statistical assessment of clustering via simulation. Other applications such as a method for integrating partitions into a cluster tree and methods for cluster tree averaging and bagging are also derived from the graph family framework.en
dc.identifier.urihttp://hdl.handle.net/10012/5055
dc.language.isoenen
dc.pendingfalseen
dc.publisherUniversity of Waterlooen
dc.subjectclusteringen
dc.subjectcluster tree distanceen
dc.subject.programStatisticsen
dc.titleA new framework for clusteringen
dc.typeDoctoral Thesisen
uws-etd.degreeDoctor of Philosophyen
uws-etd.degree.departmentStatistics and Actuarial Scienceen
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhou_Wu.pdf
Size:
2.1 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
247 B
Format:
Item-specific license agreed upon to submission
Description: