Show simple item record

dc.contributor.authorTerra, Egidioen
dc.date.accessioned2006-08-22 14:29:03 (GMT)
dc.date.available2006-08-22 14:29:03 (GMT)
dc.date.issued2004en
dc.date.submitted2004en
dc.identifier.urihttp://hdl.handle.net/10012/1071
dc.description.abstractUnderstanding interactions among words is fundamental for natural language applications. However, many statistical NLP methods still ignore this important characteristic of language. For example, information retrieval models still assume word independence. This work focuses on the creation of lexical affinity models and their applications to natural language problems. The thesis develops two approaches for computing lexical affinity. In the first, the co-occurrence frequency is the calculated by point estimation. The second uses parametric models for co-occurrence distances. For the point estimation approach, we study several alternative methods for computing the degree of affinity by making use of point estimates for co-occurrence frequency. We propose two new point estimators for co-occurrence and evaluate the measures and the estimation procedures with synonym questions. In our evaluation, synonyms are checked directly by their co-occurrence and also by comparing them indirectly, using other lexical units as supporting evidence. For the parametric approach, we address the creation of lexical affinity models by using two parametric models for distance co-occurrence: an independence model and an affinity model. The independence model is based on the geometric distribution; the affinity model is based on the gamma distribution. Both fit the data by maximizing likelihood. Two measures of affinity are derived from these parametric models and applied to the synonym questions, resulting in the best absolute performance on these questions by a method not trained to the task. We also explore the use of lexical affinity in information retrieval tasks. A new method to score missing terms by using lexical affinities is proposed. In particular, we adapt two probabilistic scoring functions for information retrieval to allow all query terms to be scored. One is a document retrieval method and the other is a passage retrieval method. Our new method, using replacement terms, shows significant improvement over the original methods.en
dc.formatapplication/pdfen
dc.format.extent841350 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.rightsCopyright: 2004, Terra, Egidio. All rights reserved.en
dc.subjectComputer Scienceen
dc.subjectlexical affinityen
dc.subjectinformation retrievalen
dc.subjectcomputational linguisticsen
dc.titleLexical Affinities and Language Applicationsen
dc.typeDoctoral Thesisen
dc.pendingfalseen
uws-etd.degree.departmentSchool of Computer Scienceen
uws-etd.degreeDoctor of Philosophyen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages