Li, MingYe, Borui2017-01-182017-01-182017-01-182016-12-12http://hdl.handle.net/10012/11201The measurement of sentence similarity is a fundamental task in natural language processing. Traditionally, it is measured either from word-level or sentence-level (such as paraphrasing), which requires many lexical and syntactic resources. In order to solve the problem of lacking labelled data and Chinese language resources, we propose a novel sentence similarity framework based on a recurrent neural network (RNN) Encoder-Decoder architecture. This RNN is pre-trained with a large set of question-question pairs, which is weakly labelled automatically and heuristically. Though less accurate, the pre-training greatly improve the performance of the model, also better than other traditional methods. Our proposed model is capable of both classification and candidate ranking. In addition, we release our evaluation dataset -- a finely annotated question similarity dataset, which will be the first public dataset under this purpose in Chinese to the best of our knowledge.enquestion answeringsentence similarityneural networkchinese corpusQuery Similarity for Community Question Answering System Based on Recurrent Encoder DecoderMaster Thesis