A Watermarking-Based Framework for Protecting Deep Image Classifiers Against Adversarial Attacks

Sun, Chen

A Watermarking-Based Framework for Protecting Deep Image Classifiers Against Adversarial Attacks

dc.contributor.advisor	Yang, En-hui
dc.contributor.author	Sun, Chen
dc.date.accessioned	2021-09-27T18:09:31Z
dc.date.available	2021-09-27T18:09:31Z
dc.date.issued	2021-09-27
dc.date.submitted	2021-09-19
dc.description.abstract	Although deep learning-based models have achieved tremendous success in image-related tasks, they are known to be vulnerable to adversarial examples---inputs with imperceptible, but subtly crafted perturbation which fool the models to produce incorrect outputs. To distinguish adversarial examples from benign images, in this thesis, we propose a novel watermarking-based framework for protecting deep image classifiers against adversarial attacks. The proposed framework consists of a watermark encoder, a possible adversary, and a detector followed by a deep image classifier to be protected. At the watermark encoder, an original benign image is watermarked with a secret key by embedding confidential watermark bits into selected DCT coefficients of the original image in JPEG format. The watermarked image may then go through possible adversarial attacks. Upon receiving a watermarked and possibly attacked image, the detector accepts it as a benign image and passes it to the subsequent classifier if the embedded watermark bits can be recovered with high precision, and otherwise rejects it as an adversarial example. The embedded watermark is further required to be imperceptible and robust to JPEG re-compression with a pre-defined quality threshold. Specific methods of watermarking and detection are also presented. It is shown by experiment on a subset of ImageNet validation dataset that the proposed framework along with the presented methods of watermarking and detection is effective against a wide range of advanced attacks (static and adaptive), achieving a near zero (effective) false negative rate for FGSM and PGD attacks (static and adaptive) with the guaranteed zero false positive rate. In addition, for all tested deep image classifiers (ResNet50V2, MobileNetV2, InceptionV3), the impact of watermarking on classification accuracy is insignificant with, on average, 0.63% and 0.49% degradation in top 1 and top 5 accuracy, respectively.	en
dc.identifier.uri	http://hdl.handle.net/10012/17548
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	adversarial defense	en
dc.subject	semi-fragile watermarking	en
dc.subject	JPEG compression	en
dc.subject	image processing	en
dc.title	A Watermarking-Based Framework for Protecting Deep Image Classifiers Against Adversarial Attacks	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Applied Science	en
uws-etd.degree.department	Electrical and Computer Engineering	en
uws-etd.degree.discipline	Electrical and Computer Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Yang, En-hui
uws.contributor.affiliation1	Faculty of Engineering	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Sun_Chen.pdf
Size:: 3.01 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Electrical and Computer Engineering