Query Similarity for Community Question Answering System Based on Recurrent Encoder Decoder
The measurement of sentence similarity is a fundamental task in natural language processing. Traditionally, it is measured either from word-level or sentence-level (such as paraphrasing), which requires many lexical and syntactic resources. In order to solve the problem of lacking labelled data and Chinese language resources, we propose a novel sentence similarity framework based on a recurrent neural network (RNN) Encoder-Decoder architecture. This RNN is pre-trained with a large set of question-question pairs, which is weakly labelled automatically and heuristically. Though less accurate, the pre-training greatly improve the performance of the model, also better than other traditional methods. Our proposed model is capable of both classification and candidate ranking. In addition, we release our evaluation dataset -- a finely annotated question similarity dataset, which will be the first public dataset under this purpose in Chinese to the best of our knowledge.
Cite this version of the work
Borui Ye (2017). Query Similarity for Community Question Answering System Based on Recurrent Encoder Decoder. UWSpace. http://hdl.handle.net/10012/11201