UWSpace is currently experiencing technical difficulties resulting from its recent migration to a new version of its software. These technical issues are not affecting the submission and browse features of the site. UWaterloo community members may continue submitting items to UWSpace. We apologize for the inconvenience, and are actively working to resolve these technical issues.
 

Large Data-to-Text Generation

Loading...
Thumbnail Image

Date

2023-05-16

Authors

Sarangian, Varnan

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

This thesis presents a domain-driven approach to sports game summarization, a specific instance of large data-to-text generation (DTG). We first address the data fidelity issue in the Rotowire dataset by supplementing existing input records and demonstrating larger relative improvements compared to previously proposed purification schemes. As this method further increases the total number of input records, we alternatively formulate this problem as a multimodal problem (i.e. visual data-to-text), discussing potential advantages over purely textual approaches and studying its effectiveness for future expansion. We work exclusively with pre-trained end-to-end transformers throughout, allowing us to evaluate the efficacy of sparse attention and multimodal encoder-decoders in DTG and providing appropriate benchmarks for future work. To automatically evaluate the statistical correctness of generated summaries, we also extend prior work on automatic relation extraction and build an updated pipeline that incorporates low amounts of human-annotated data which are quickly inflated via data augmentation. By formulating this in a ”text-to-text” fashion, we are able to take advantage of LLMs and achieve significantly higher precision and recall than previous methods while tracking three times the number of unique relations. Our updated models are more consistent and reliable by incorporating human-verified data partitions into the training and evaluation process.

Description

Keywords

LC Keywords

Citation