Show simple item record

dc.contributor.authorGweon, Hyukjun
dc.date.accessioned2017-08-01 14:35:37 (GMT)
dc.date.available2017-08-01 14:35:37 (GMT)
dc.date.issued2017-08-01
dc.date.submitted2017-07-24
dc.identifier.urihttp://hdl.handle.net/10012/12106
dc.description.abstractClassification is essential in statistical learning. This thesis deals with three topics in classification: multi-label classification, nonparametric multi-class classification and a special type of text categorization called occupation coding. For each topic, novel approaches are proposed with the goal of high predictive performance. This is empirically demonstrated for each method. In multi-label classification, observations may be associated with multiple classes or labels simultaneously. Generally, correlations exist between labels and taking into account the label correlations is important during the classification process. This thesis proposes an approach based on the nearest neighbor principle that considers neighbors both in the feature (x) and the label (y) space. The proposed method chooses the labelset of a training observation that minimizes a weighted function of the distances in feature and label space. By selecting an entire labelset as the prediction, the method implicitly considers label correlations. In multi-class classification, the well-known k-nearest neighbors method is especially desirable when the response surface exhibits highly local behavior. A novel approach is presented that makes a prediction based on the k-th nearest neighbor from each class. The method not only provides estimates for class posterior probabilities but also converges to the Bayes classifier as the size of the training data increases. Further, the method is extended using the idea of an ensemble. Occupation coding is an important multi-class text categorization problem. Since fully automated classification is challenging, researchers focus more on partially automated coding. Three approaches based on underlying statistical learning methods are proposed to improve the classification accuracy of the underlying statistical learning methods.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectMachine Learningen
dc.subjectMulti-label Classificationen
dc.subjectNon-parametric Classificationen
dc.subjectStatistical Learningen
dc.subjectClassification Methodsen
dc.titleStatistical Learning Approaches to Some Classification Problemsen
dc.typeDoctoral Thesisen
dc.pendingfalse
uws-etd.degree.departmentStatistics and Actuarial Scienceen
uws-etd.degree.disciplineStatisticsen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeDoctor of Philosophyen
uws.contributor.advisorSchonlau, Matthias
uws.contributor.advisorSteiner, Stefan
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages