Contributions to Unsupervised and Semi-Supervised Learning

Pal, David

Contributions to Unsupervised and Semi-Supervised Learning

dc.contributor.author	Pal, David
dc.date.accessioned	2009-05-22T14:03:10Z
dc.date.available	2009-05-22T14:03:10Z
dc.date.issued	2009-05-22T14:03:10Z
dc.date.submitted	2009-05-21
dc.description.abstract	This thesis studies two problems in theoretical machine learning. The first part of the thesis investigates the statistical stability of clustering algorithms. In the second part, we study the relative advantage of having unlabeled data in classification problems. Clustering stability was proposed and used as a model selection method in clustering tasks. The main idea of the method is that from a given data set two independent samples are taken. Each sample individually is clustered with the same clustering algorithm, with the same setting of its parameters. If the two resulting clusterings turn out to be close in some metric, it is concluded that the clustering algorithm and the setting of its parameters match the data set, and that clusterings obtained are meaningful. We study asymptotic properties of this method for certain types of cost minimizing clustering algorithms and relate their asymptotic stability to the number of optimal solutions of the underlying optimization problem. In classification problems, it is often expensive to obtain labeled data, but on the other hand, unlabeled data are often plentiful and cheap. We study how the access to unlabeled data can decrease the amount of labeled data needed in the worst-case sense. We propose an extension of the probably approximately correct (PAC) model in which this question can be naturally studied. We show that for certain basic tasks the access to unlabeled data might, at best, halve the amount of labeled data needed.	en
dc.identifier.uri	http://hdl.handle.net/10012/4445
dc.language.iso	en	en
dc.pending	false	en
dc.publisher	University of Waterloo	en
dc.subject	machine learning	en
dc.subject	statistics	en
dc.subject	unsupervised learning	en
dc.subject	semi-supervised learning	en
dc.subject	learning theory	en
dc.subject.program	Computer Science	en
dc.title	Contributions to Unsupervised and Semi-Supervised Learning	en
dc.type	Doctoral Thesis	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.degree.department	School of Computer Science	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: main.pdf
Size:: 471.38 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 246 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science