Disentanglement of Syntactic Components for Text Generation

Das, Utsav Tushar

Disentanglement of Syntactic Components for Text Generation

Files

Das__Utsav_Tushar.pdf (1.52 MB)

Date

2022-02-18

Authors

Das, Utsav Tushar

Advisor

Vechtomova, Olga

Publisher

University of Waterloo

Abstract

Modelling human generated text, i.e., natural language data, is an important challenge in artificial intelligence. A good AI program should be able to understand and analyze natural language, and generate fluent and accurate responses. This standard is seen in applications of AI for natural language like machine translation, summarization, and dialog generation, all of which require the above ability. This work examines the application of deep neural networks for natural language generation. We explore how graph convolutional networks (GCNs) can be paired with recurrent neural networks (RNNs) for text generation. GCNs have the advantage of being able to leverage the inherent graphical nature of text. Sentences can be expressed as dependency trees, and GCNs can incorporate this information to generate sentences in a syntax-aware manner. Modelling sentences with both dependency trees and word representations allows us to disentangle the syntactic components of sentences and generate sentences while fusing parts of speech from multiple sentences. Our methodology combines the sentence representations from an RNN with that of a GCN to allow a decoder to gain syntactic information while reconstructing a sentence. We explore different ways of separating the syntax components in a sentence and inspect how the generation operates. We report BLEU and perplexity scores to evaluate how well the model incorporates the content based on its syntax from multiple sentences. We also observe, qualitatively, how the model generates fluent and coherent sentences while assimilating syntactic components from multiple sentences.