Analyzing Adversarial Examples: A Framework to Study Adversary Knowledge
Loading...
Date
2024-01-22
Authors
Fenaux, Lucas
Advisor
Kerschbaum, Florian
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Adversarial examples are malicious inputs to trained machine learning models supplied to trigger a misclassification. This type of attack has been studied for close to a decade, and we find that there is a lack of study and formalization of adversary knowledge when mounting attacks. This has yielded a complex space of attack research with hard-to-compare threat models and attacks. We solve this in the image classification domain by providing a theoretical framework to study adversary knowledge inspired by work in order theory. We present an adversarial example game, based on cryptographic games, to standardize attack procedures. We survey recent attacks in the image classification domain that showcase the current state of adversarial example research. Together with our formalization, we compile results that both confirm existing beliefs about adversary knowledge, such as the potency of information about the attacked model as well as allow us to derive new conclusions on the difficulty associated with the white-box and transferable threat models, for example, transferable attacks might not be as difficult as previously thought.
Description
Keywords
machine learning, security, adversarial examples