Show simple item record

dc.contributor.authorHe, Zhoushanyue
dc.date.accessioned2021-01-13 16:25:05 (GMT)
dc.date.available2021-01-13 16:25:05 (GMT)
dc.date.issued2021-01-13
dc.date.submitted2021-01-08
dc.identifier.urihttp://hdl.handle.net/10012/16643
dc.description.abstractOpen-ended questions allow participants to answer survey questions without any constraint. Responses to open-ended questions, however, are more difficult to analyze quantitatively than close-ended questions. In this thesis, I focus on analyzing text responses to open-ended questions in surveys. The thesis includes three parts: double coding of open-ended questions, predictions of potential coding errors in manual coding, and comparison between manual coding and automatic coding. Double coding refers to two coders coding the same observations independently. It is often used to assess coders' reliability. I investigate the usage of double coding to improve the performance of automatic coding. I find that, when the budget for manual coding is fixed, double coding which involves a more experienced expert coder results in a smaller but cleaner training set than single coding, and improves the prediction of statistical learning models when the coding error rate of coders exceeds a threshold. When data have already been double coded, double coding always outperforms single coding. In many research projects, only a subset of data can be double coded due to limited funding. My idea is that researchers can make use of the double-coded subset to improve the coding quality of the remaining single-coded observations. Therefore, I propose a model-assisted coding process that predicts the risk of coding errors. High risk text answers are then double-coded. The proposed coding process reduces coding error while keeping the ability to assess inter-coder reliability. Manual coding and automatic coding are two main approaches to code responses to open-ended questions, yet the similarity or difference in terms of coding error has not been well studied. I compare the coding error of human coders and automated coders. I find, despite a different error rate, human coders and automated coders make similar mistakes.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjecttext analysisen
dc.subjectdouble codingen
dc.subjectmanual codingen
dc.subjectautomated codingen
dc.subjectopen-ended questionen
dc.subjecttext classificationen
dc.subjectstatistical learningen
dc.titleOn the Automatic Coding of Text Answers to Open-ended Questions in Surveysen
dc.typeDoctoral Thesisen
dc.pendingfalse
uws-etd.degree.departmentStatistics and Actuarial Scienceen
uws-etd.degree.disciplineStatisticsen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeDoctor of Philosophyen
uws-etd.embargo.terms0en
uws.contributor.advisorSchonlau, Matthias
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages