Query Similarity for Community Question Answering System Based on Recurrent Encoder Decoder
Loading...
Date
2017-01-18
Authors
Ye, Borui
Advisor
Li, Ming
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
The measurement of sentence similarity is a fundamental task in natural language processing. Traditionally, it is measured either from word-level or sentence-level (such as paraphrasing), which requires many lexical and syntactic resources. In order to solve the problem of lacking labelled data and Chinese language resources, we propose a novel sentence similarity framework based on a recurrent neural network (RNN) Encoder-Decoder architecture. This RNN is pre-trained with a large set of question-question pairs, which is weakly labelled automatically and heuristically. Though less accurate, the pre-training greatly improve the performance of the model, also better than other traditional methods. Our proposed model is capable of both classification and candidate ranking. In addition, we release our evaluation dataset -- a finely annotated question similarity dataset, which will be the first public dataset under this purpose in Chinese to the best of our knowledge.
Description
Keywords
question answering, sentence similarity, neural network, chinese corpus