Analyzing Adversarial Examples: A Framework to Study Adversary Knowledge

Loading...
Thumbnail Image

Date

2024-01-22

Authors

Fenaux, Lucas

Advisor

Kerschbaum, Florian

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Adversarial examples are malicious inputs to trained machine learning models supplied to trigger a misclassification. This type of attack has been studied for close to a decade, and we find that there is a lack of study and formalization of adversary knowledge when mounting attacks. This has yielded a complex space of attack research with hard-to-compare threat models and attacks. We solve this in the image classification domain by providing a theoretical framework to study adversary knowledge inspired by work in order theory. We present an adversarial example game, based on cryptographic games, to standardize attack procedures. We survey recent attacks in the image classification domain that showcase the current state of adversarial example research. Together with our formalization, we compile results that both confirm existing beliefs about adversary knowledge, such as the potency of information about the attacked model as well as allow us to derive new conclusions on the difficulty associated with the white-box and transferable threat models, for example, transferable attacks might not be as difficult as previously thought.

Description

Keywords

machine learning, security, adversarial examples

LC Subject Headings

Citation