Semantic Distance in WordNet: A Simplified and Improved Measure of Semantic Relatedness
Loading...
Date
2006
Authors
Scriver, Aaron
Advisor
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Measures of semantic distance have received a great deal of attention recently in the field of computational lexical semantics. Although techniques for approximating the semantic distance of two concepts have existed for several decades, the introduction of the WordNet lexical database and improvements in corpus analysis have enabled significant improvements in semantic distance measures. <br /><br /> In this study we investigate a special kind of semantic distance, called <em>semantic relatedness</em>. Lexical semantic relatedness measures have proved to be useful for a number of applications, such as word sense disambiguation and real-word spelling error correction. Most relatedness measures rely on the observation that the shortest path between nodes in a semantic network provides a representation of the relationship between two concepts. The strength of relatedness is computed in terms of this path. <br /><br /> This dissertation makes several significant contributions to the study of semantic relatedness. We describe a new measure that calculates semantic relatedness as a function of the shortest path in a semantic network. The proposed measure achieves better results than other standard measures and yet is much simpler than previous models. The proposed measure is shown to achieve a correlation of <em>r</em> = 0. 897 with the judgments of human test subjects using a standard benchmark data set, representing the best performance reported in the literature. We also provide a general formal description for a class of semantic distance measures — namely, those measures that compute semantic distance from the shortest path in a semantic network. Lastly, we suggest a new methodology for developing path-based semantic distance measures that would limit the possibility of unnecessary complexity in future measures.
Description
Keywords
Computer Science, relatedness, similarity, distance, lexical, semantic, computational, measure, wordnet