Show simple item record

dc.contributor.authorTIAN, Yongqiang
dc.date.accessioned2023-08-01 12:49:29 (GMT)
dc.date.available2023-08-01 12:49:29 (GMT)
dc.date.issued2023-08-01
dc.date.submitted23-07-28
dc.identifier.urihttp://hdl.handle.net/10012/19644
dc.description.abstractDeep Learning (DL) applications are widely deployed in diverse areas, such as image classification, natural language processing, and auto-driving systems. Although these applications achieve outstanding performance in terms of accuracy, developers have raised strong concerns about their reliability since the logic of DL applications is a black box for humans. Specifically, DL applications learn the logic during stochastic training and encode the logic in high-dimensional weights of DL models. Unlike source code in conventional software, such weights are infeasible for humans to directly interpret, examine, and validate. As a result, the defects in DL applications are not easy to be detected in software development stages and may cause catastrophic accidents in safety-critical missions. Therefore, it is critical to adequately test DL applications in terms of reliability before they are deployed. This thesis aims to propose automatic approaches to testing DL applications from the perspective of reliability. It consists of the following three studies. The first study proposes object-relevancy, a property that reliable DL-based image classifiers should comply with, i.e., the classification results should be made based on the features relevant to the target object in a given image, instead of irrelevant features such as the background. This study further proposes a metamorphic testing approach and two corresponding metamorphic relations to assess if this property is violated in image classifications. The evaluation shows that the proposed approach can effectively detect the unreliable inferences violating the object-relevancy property, with the average precision 64.1% and 96.4% for the two relations, respectively. The subsequent empirical study reveals that such unreliable inferences are prevalent in the real world and the existing training strategies cannot tame this issue effectively. The second study concentrates on the reliability issues induced by model compression of DL applications. Model compression can significantly reduce the sizes of Deep Neural Network (DNN) models, and thus facilitate the dissemination of sophisticated, sizable DNN models. However, the prediction results of compressed models may deviate from those of their original models, resulting in unreliable DL applications in deployment. To help developers thoroughly understand the impact of model compression, it is essential to test these models to find those deviated behaviors before dissemination. This study proposes DFLARE, a novel, search-based, black-box testing technique. The evaluation shows that DFLARE constantly outperforms the baseline in both efficacy and efficiency. More importantly, the triggering inputs found by DFLARE can be used to repair up to 48.48% deviated behaviors. The third study focuses on the reliability of DL-based vulnerability detection (DLVD) techniques. DLVD techniques are designed to detect the vulnerability in the source code. However, these techniques may only capture the syntactic patterns of vulnerable code while ignoring the semantic information in the source code. As a result, malicious users can easily fool such techniques by manipulating the syntactic patterns of vulnerable code, e.g., variable renaming. This study proposes a new methodology to evaluate the learning ability of DLVD techniques, i.e., whether a DLVD technique can capture the semantic information from vulnerable source code and leverage such information in detection. Specifically, this approach creates a special dataset in which the vulnerable functions and non-vulnerable ones have almost identical syntactic code patterns but different semantic meanings. If a detection approach cannot capture the semantic difference between the vulnerable functions and the non-vulnerable ones, this approach will have low performance on the constructed dataset. Our preliminary results show that two common detection approaches are ineffective in capturing the semantic information from source code.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectsoftware testingen
dc.subjectdeep learningen
dc.titleAssessing the Reliability of Deep Learning Applicationsen
dc.typeDoctoral Thesisen
dc.pendingfalse
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeDoctor of Philosophyen
uws-etd.embargo.terms0en
uws.contributor.advisorSun, Chengnian
uws.contributor.advisorCHEUNG, Shing-Chi
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages