Learning from Partially Labeled Data: Unsupervised and Semi-supervised Learning on Graphs and Learning with Distribution Shifting

Huang, Jiayuan

Learning from Partially Labeled Data: Unsupervised and Semi-supervised Learning on Graphs and Learning with Distribution Shifting

dc.contributor.author	Huang, Jiayuan
dc.date.accessioned	2007-08-20T19:50:10Z
dc.date.available	2007-08-20T19:50:10Z
dc.date.issued	2007-08-20T19:50:10Z
dc.date.submitted	2007
dc.description.abstract	This thesis focuses on two fundamental machine learning problems:unsupervised learning, where no label information is available, and semi-supervised learning, where a small amount of labels are given in addition to unlabeled data. These problems arise in many real word applications, such as Web analysis and bioinformatics,where a large amount of data is available, but no or only a small amount of labeled data exists. Obtaining classification labels in these domains is usually quite difficult because it involves either manual labeling or physical experimentation. This thesis approaches these problems from two perspectives: graph based and distribution based. First, I investigate a series of graph based learning algorithms that are able to exploit information embedded in different types of graph structures. These algorithms allow label information to be shared between nodes in the graph---ultimately communicating information globally to yield effective unsupervised and semi-supervised learning. In particular, I extend existing graph based learning algorithms, currently based on undirected graphs, to more general graph types, including directed graphs, hypergraphs and complex networks. These richer graph representations allow one to more naturally capture the intrinsic data relationships that exist, for example, in Web data, relational data, bioinformatics and social networks. For each of these generalized graph structures I show how information propagation can be characterized by distinct random walk models, and then use this characterization to develop new unsupervised and semi-supervised learning algorithms. Second, I investigate a more statistically oriented approach that explicitly models a learning scenario where the training and test examples come from different distributions. This is a difficult situation for standard statistical learning approaches, since they typically incorporate an assumption that the distributions for training and test sets are similar, if not identical. To achieve good performance in this scenario, I utilize unlabeled data to correct the bias between the training and test distributions. A key idea is to produce resampling weights for bias correction by working directly in a feature space and bypassing the problem of explicit density estimation. The technique can be easily applied to many different supervised learning algorithms, automatically adapting their behavior to cope with distribution shifting between training and test data.	en
dc.identifier.uri	http://hdl.handle.net/10012/3165
dc.language.iso	en	en
dc.pending	false	en
dc.publisher	University of Waterloo	en
dc.subject	unsupervised learning	en
dc.subject	semi-supervised learning	en
dc.subject	graph based learning	en
dc.subject	distribution shifting	en
dc.subject.program	Computer Science	en
dc.title	Learning from Partially Labeled Data: Unsupervised and Semi-supervised Learning on Graphs and Learning with Distribution Shifting	en
dc.type	Doctoral Thesis	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.degree.department	School of Computer Science	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Thesis_Jiayuan.pdf
Size:: 1.43 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 250 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science