A Semantic Distance of Natural Language Queries Based on Question-Answer Pairs
Loading...
Date
2014-08-21
Authors
Xiong, Kun
Advisor
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Many Natural Language Processing (NLP) techniques have been applied in the field
of Question Answering (QA) for understanding natural language queries. Practical QA
systems classify a natural language query into vertical domains, and determine whether it
is similar to a question with known or latent answers. Current mobile personal assistant
applications process queries, recognized from voice input or translated from cross-lingual
queries. Theoretically speaking, all these problems rely on an intuitive notion of semantic distance. However, it is neither definable nor computable. Many studies attempt to
approximate such a semantic distance in heuristic ways, for instance, distances based on
synonym dictionaries. In this paper, we propose a unified algorithm to approximate the
semantic distance by a well-defined information distance theory. The algorithm depends
on a pre-constructed data structure - semantic clusters, which is built from 35 million
question-answer pairs automatically. From the semantic measurement of questions, we
implement two practical NLP systems, including a question classifier and a translation
corrector. Then a series of comparison experiments have been conducted on both implementations. Experimental results demonstrate that our distance based approach produces
fewer errors in classification, compared with other academic works. Also, our translation
correction system achieves significant improvements on the Google translation results.
Description
Keywords
semantic distance, question classification, question translation, question-answer pairs