GRS: Combining Generation and Revision in Unsupervised Sentence Simplification

dc.contributor.authorDehghan, Mohammad
dc.date.accessioned2022-08-30T13:39:04Z
dc.date.available2022-08-30T13:39:04Z
dc.date.issued2022-08-30
dc.date.submitted2022-08-09
dc.description.abstractText simplification is a task in the natural language processing field that alters a given text to reduce the structural and lexical complexity of the text while preserving the underlying meaning. We can classify existing text simplification approaches into generative and revision-based methods. Through explicit edit operations such as word deletion and lexical substitution, revision-based strategies iteratively simplify a given text in multiple steps. However, generative approaches generate simplified sentences from a complex sentence in one step. Generative models do not have explicit edit operations but learn implicit edit operations from data. Revision-based methods are more controllable and interpretable than generative models. On the other hand, generative models can apply more complex edits (such as paraphrasing) to a given text compared to the revision-based method. We propose GRS: an unsupervised approach to sentence simplification that combines text generation and text revision. We start with an iterative framework in which an input sentence is revised using explicit edit operations such as word deletion and add paraphrasing as a new edit operation. This allows us to combine the advantages of generative and revision-based approaches. Paraphrasing captures complex edit operations, and the use of explicit edit operations in an iterative manner provides controllability and interpretability. We demonstrate the advantages of GRS compared to existing methods. To evaluate our model, we use Newsela and ASSET datasets that contain high-quality complex-simple sentence pairs and are commonly used in the literature. The Newsela dataset contains 1,840 news articles re-written for children at five different readability standards. The ASSET dataset comprises 2,359 sentences from English Wikipedia. GRS outperforms all unsupervised methods on the Newsela dataset and bridges the gap between revisions-based and generative models on ASSET datasets.en
dc.identifier.urihttp://hdl.handle.net/10012/18673
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://github.com/imohammad12/GRSen
dc.subjectnatural language processingen
dc.subjecttext simplificationen
dc.subjectdeep learningen
dc.subjectartificial intelligenceen
dc.titleGRS: Combining Generation and Revision in Unsupervised Sentence Simplificationen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0en
uws.contributor.advisorGolab, Lukasz
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Dehghan_Mohammad.pdf
Size:
1.86 MB
Format:
Adobe Portable Document Format
Description:
Master's Thesis

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: