Query Similarity for Community Question Answering System Based on Recurrent Encoder Decoder

Loading...
Thumbnail Image

Date

2017-01-18

Authors

Ye, Borui

Advisor

Li, Ming

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

The measurement of sentence similarity is a fundamental task in natural language processing. Traditionally, it is measured either from word-level or sentence-level (such as paraphrasing), which requires many lexical and syntactic resources. In order to solve the problem of lacking labelled data and Chinese language resources, we propose a novel sentence similarity framework based on a recurrent neural network (RNN) Encoder-Decoder architecture. This RNN is pre-trained with a large set of question-question pairs, which is weakly labelled automatically and heuristically. Though less accurate, the pre-training greatly improve the performance of the model, also better than other traditional methods. Our proposed model is capable of both classification and candidate ranking. In addition, we release our evaluation dataset -- a finely annotated question similarity dataset, which will be the first public dataset under this purpose in Chinese to the best of our knowledge.

Description

Keywords

question answering, sentence similarity, neural network, chinese corpus

LC Keywords

Citation