Analyzing Adversarial Examples: A Framework to Study Adversary Knowledge

Fenaux, LucasAnalyzing Adversarial Examples: A Framework to Study Adversary KnowledgeUniversity of Waterloo2024machine learningsecurityadversarial examplesMy UniversityMy UniversityKerschbaum, Florian2024-01-222024-01-222024-01-222024-01-12enMaster Thesishttp://hdl.handle.net/10012/20260Adversarial examples are malicious inputs to trained machine learning models supplied to trigger a misclassification. This type of attack has been studied for close to a decade, and we find that there is a lack of study and formalization of adversary knowledge when mounting attacks. This has yielded a complex space of attack research with hard-to-compare threat models and attacks. We solve this in the image classification domain by providing a theoretical framework to study adversary knowledge inspired by work in order theory. We present an adversarial example game, based on cryptographic games, to standardize attack procedures. We survey recent attacks in the image classification domain that showcase the current state of adversarial example research. Together with our formalization, we compile results that both confirm existing beliefs about adversary knowledge, such as the potency of information about the attacked model as well as allow us to derive new conclusions on the difficulty associated with the white-box and transferable threat models, for example, transferable attacks might not be as difficult as previously thought.