Statistical Learning Approaches to Some Classification Problems

dc.contributor.authorGweon, Hyukjun
dc.date.accessioned2017-08-01T14:35:37Z
dc.date.available2017-08-01T14:35:37Z
dc.date.issued2017-08-01
dc.date.submitted2017-07-24
dc.description.abstractClassification is essential in statistical learning. This thesis deals with three topics in classification: multi-label classification, nonparametric multi-class classification and a special type of text categorization called occupation coding. For each topic, novel approaches are proposed with the goal of high predictive performance. This is empirically demonstrated for each method. In multi-label classification, observations may be associated with multiple classes or labels simultaneously. Generally, correlations exist between labels and taking into account the label correlations is important during the classification process. This thesis proposes an approach based on the nearest neighbor principle that considers neighbors both in the feature (x) and the label (y) space. The proposed method chooses the labelset of a training observation that minimizes a weighted function of the distances in feature and label space. By selecting an entire labelset as the prediction, the method implicitly considers label correlations. In multi-class classification, the well-known k-nearest neighbors method is especially desirable when the response surface exhibits highly local behavior. A novel approach is presented that makes a prediction based on the k-th nearest neighbor from each class. The method not only provides estimates for class posterior probabilities but also converges to the Bayes classifier as the size of the training data increases. Further, the method is extended using the idea of an ensemble. Occupation coding is an important multi-class text categorization problem. Since fully automated classification is challenging, researchers focus more on partially automated coding. Three approaches based on underlying statistical learning methods are proposed to improve the classification accuracy of the underlying statistical learning methods.en
dc.identifier.urihttp://hdl.handle.net/10012/12106
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectMachine Learningen
dc.subjectMulti-label Classificationen
dc.subjectNon-parametric Classificationen
dc.subjectStatistical Learningen
dc.subjectClassification Methodsen
dc.titleStatistical Learning Approaches to Some Classification Problemsen
dc.typeDoctoral Thesisen
uws-etd.degreeDoctor of Philosophyen
uws-etd.degree.departmentStatistics and Actuarial Scienceen
uws-etd.degree.disciplineStatisticsen
uws-etd.degree.grantorUniversity of Waterlooen
uws.contributor.advisorSchonlau, Matthias
uws.contributor.advisorSteiner, Stefan
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Gweon_Hyukjun.pdf
Size:
4.28 MB
Format:
Adobe Portable Document Format
Description:
PhD thesis

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.08 KB
Format:
Item-specific license agreed upon to submission
Description: