Assessing the Reliability of Deep Learning Applications

TIAN, Yongqiang

dc.contributor.author	TIAN, Yongqiang
dc.date.accessioned	2023-08-01 12:49:29 (GMT)
dc.date.available	2023-08-01 12:49:29 (GMT)
dc.date.issued	2023-08-01
dc.date.submitted	23-07-28
dc.identifier.uri	http://hdl.handle.net/10012/19644
dc.description.abstract	Deep Learning (DL) applications are widely deployed in diverse areas, such as image classification, natural language processing, and auto-driving systems. Although these applications achieve outstanding performance in terms of accuracy, developers have raised strong concerns about their reliability since the logic of DL applications is a black box for humans. Specifically, DL applications learn the logic during stochastic training and encode the logic in high-dimensional weights of DL models. Unlike source code in conventional software, such weights are infeasible for humans to directly interpret, examine, and validate. As a result, the defects in DL applications are not easy to be detected in software development stages and may cause catastrophic accidents in safety-critical missions. Therefore, it is critical to adequately test DL applications in terms of reliability before they are deployed. This thesis aims to propose automatic approaches to testing DL applications from the perspective of reliability. It consists of the following three studies. The first study proposes object-relevancy, a property that reliable DL-based image classifiers should comply with, i.e., the classification results should be made based on the features relevant to the target object in a given image, instead of irrelevant features such as the background. This study further proposes a metamorphic testing approach and two corresponding metamorphic relations to assess if this property is violated in image classifications. The evaluation shows that the proposed approach can effectively detect the unreliable inferences violating the object-relevancy property, with the average precision 64.1% and 96.4% for the two relations, respectively. The subsequent empirical study reveals that such unreliable inferences are prevalent in the real world and the existing training strategies cannot tame this issue effectively. The second study concentrates on the reliability issues induced by model compression of DL applications. Model compression can significantly reduce the sizes of Deep Neural Network (DNN) models, and thus facilitate the dissemination of sophisticated, sizable DNN models. However, the prediction results of compressed models may deviate from those of their original models, resulting in unreliable DL applications in deployment. To help developers thoroughly understand the impact of model compression, it is essential to test these models to find those deviated behaviors before dissemination. This study proposes DFLARE, a novel, search-based, black-box testing technique. The evaluation shows that DFLARE constantly outperforms the baseline in both efficacy and efficiency. More importantly, the triggering inputs found by DFLARE can be used to repair up to 48.48% deviated behaviors. The third study focuses on the reliability of DL-based vulnerability detection (DLVD) techniques. DLVD techniques are designed to detect the vulnerability in the source code. However, these techniques may only capture the syntactic patterns of vulnerable code while ignoring the semantic information in the source code. As a result, malicious users can easily fool such techniques by manipulating the syntactic patterns of vulnerable code, e.g., variable renaming. This study proposes a new methodology to evaluate the learning ability of DLVD techniques, i.e., whether a DLVD technique can capture the semantic information from vulnerable source code and leverage such information in detection. Specifically, this approach creates a special dataset in which the vulnerable functions and non-vulnerable ones have almost identical syntactic code patterns but different semantic meanings. If a detection approach cannot capture the semantic difference between the vulnerable functions and the non-vulnerable ones, this approach will have low performance on the constructed dataset. Our preliminary results show that two common detection approaches are ineffective in capturing the semantic information from source code.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.subject	software testing	en
dc.subject	deep learning	en
dc.title	Assessing the Reliability of Deep Learning Applications	en
dc.type	Doctoral Thesis	en
dc.pending	false
uws-etd.degree.department	David R. Cheriton School of Computer Science	en
uws-etd.degree.discipline	Computer Science	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Sun, Chengnian
uws.contributor.advisor	CHEUNG, Shing-Chi
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Tian_Yongqiang.pdf
Size:: 4.316Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Show simple item record