Towards Effectively Testing Sequence-to-Sequence models from White-Box Perspectives

Shao, Hanying

Towards Effectively Testing Sequence-to-Sequence models from White-Box Perspectives

dc.contributor.advisor	Shang, Weiyi
dc.contributor.author	Shao, Hanying
dc.date.accessioned	2024-05-22T16:12:37Z
dc.date.available	2024-05-22T16:12:37Z
dc.date.issued	2024-05-22
dc.date.submitted	2024-05-19
dc.description.abstract	In the field of Natural Language Processing (NLP), which encompasses diverse tasks such as machine translation, question answering, and others, there has been notable advancement in recent years. Despite this progress, NLP systems, including those based on sequence-to-sequence models, confront various challenges. To tackle these, metamorphic testing methods have been employed across different NLP tasks. These methods entail task-specific adjustments at the token or sentence level. For example, in machine translation, this approach might involve replacing a single token in the source sentence to generate variants, whereas in question answering, adjustments might include altering or adding sentences within the question or context. By evaluating the system’s responses to these alterations, potential deficiencies in the NLP systems can be identified. Determining the most effective modifications, particularly, especially in terms of which tokens or sentences contribute to system instability, is an essential and continuous aspect of metamorphic testing research. To tackle this challenge, we introduce two white-box methods to detect sensitive tokens in the source text, alterations to which could potentially trigger errors in sequence-to-sequence models. The initial method, termed GRI, leverages GRadient Information for identifying these sensitive tokens, while the second method, WALI, utilizes Word ALignment Information to pinpoint the unstable tokens. We assess these approaches using a Transformer-based model for translation and question answering tasks, comparing them against datasets used by benchmark methods. When applying white-box approaches to machine translation testing and using them to generate test cases, the results show that both GRI and WALI can effectively improve the efficiency of the black-box testing strategies for revealing translation bugs. Specifically, our approaches can always outperform state-of-the-art automatic testing approaches from two aspects: (1) under a certain testing budget (i.e., number of executed test cases), both GRI and WALI can reveal a larger number of bugs than baseline approaches, and (2) when given a predefined testing goal (i.e., number of detected bugs), our approaches always require fewer testing resources (i.e., a reduced number of test cases to execute). Additionally, we explore the application of GRI and WALI in test prioritization and evaluate their performance in QA software testing. The results show that GRI can effectively prioritize test cases that are highly likely to generate bugs and achieve a higher percentage of fault detection given the same execution budget. WALI, on the other hand, exhibits results similar to baseline approaches, suggesting that while it may not enhance prioritization as significantly as GRI, it maintains a comparable level of effectiveness.	en
dc.identifier.uri	http://hdl.handle.net/10012/20579
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.relation.uri	https://github.com/conf2024-8888/NMT-Testing	en
dc.subject	Neural network	en
dc.subject	Software Testing	en
dc.subject	Neural machine translation	en
dc.subject	Neural machine translation model testing	en
dc.title	Towards Effectively Testing Sequence-to-Sequence models from White-Box Perspectives	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Applied Science	en
uws-etd.degree.department	Electrical and Computer Engineering	en
uws-etd.degree.discipline	Electrical and Computer Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Shang, Weiyi
uws.contributor.affiliation1	Faculty of Engineering	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Shao_Hanying.pdf
Size:: 4.11 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Electrical and Computer Engineering