Controlled Generation of Stylized Text Using Semantic and Phonetic Representations

Gudmundsson, Egill Ian

Controlled Generation of Stylized Text Using Semantic and Phonetic Representations

Files

Gudmundsson_Egill.pdf (1.09 MB)

Date

2022-01-21

Authors

Gudmundsson, Egill Ian

Advisor

Vechtomova, Olga

Publisher

University of Waterloo

Abstract

Neural networks are a popular choice of models for the purpose of text generation. Variational autoencoders have been shown to be good at reconstructing text and generating novel text. However, controlling certain aspects of the generated text (e.g., length, semantics, cadence) has proven a more difficult task. The objectives of disentanglement and controlled text generation have thus become areas of interest, with various approaches depending on the aspects we desire to control. In this work we study controllable generation of lyric text based on semantic and phonetic criteria. The phonetic information takes the form of generalized phonetic patterns. A Bag-of-Words Variational Autoencoder (VAE) extracts and models the semantic information, while a phonetic pattern VAE handles the phonetic information. Each uses several regularization techniques for its respective latent space and the information from each is fed to a lyrics decoder to generate novel lyric lines that would satisfy both the Bag-of-Words and phonetic constraints. The experiments show that our model can learn to reconstruct phonetic patterns extracted from text and use them with the Bag-of-Words representations to reconstruct the original lyric lines. Together, the learned representations of phonetic patterns and Bag-of-Words constraints can be used to generate new lyrics.