Security and Ownership Verification in Deep Reinforcement Learning

Wang, Shelly

Security and Ownership Verification in Deep Reinforcement Learning

Files

Wang_Shelly.pdf (6.15 MB)

Date

2022-07-15

Authors

Wang, Shelly

Advisor

Asokan, Nadarajah

Publisher

University of Waterloo

Abstract

Deep reinforcement learning (DRL) has seen many successes in complex tasks such as robot manipulation, autonomous driving, and competitive games. However, there are few studies on the security threats against DRL systems. In this thesis, we focus on two security concerns in DRL. The first security concern is adversarial perturbation attacks against DRL agents. % Adversarial perturbation attacks mislead DRL agents into taking sub-optimal actions by applying a small imperceptible perturbation to the states of the environment. Adversarial perturbation attacks mislead DRL agents into taking sub-optimal actions. These attacks apply small imperceptible perturbations to the agent's observations of the environment. Prior work shows that DRL agents are vulnerable to adversarial perturbation attacks. However, prior attacks are difficult to deploy in real-time settings. We show that universal adversarial perturbations (UAPs) are effective in reducing a DRL agent's performance in their tasks and are fast enough to be mounted in real-time. We propose three variants of UAPs. We evaluate the effectiveness of UAPs against different DRL agents (DQN, A2C, and PPO) in three different Atari 2600 games (Pong, Freeway, and Breakout). We show that UAPs can degrade agent performance by 100\%, in some cases even for a perturbation bound as small as $l_{\infty} = 0.01$. We also propose a technique for detecting adversarial perturbation attacks. An effective detection technique can be used in DRL tasks with potentially negative outcomes (such as the agents failing in a task or accumulating negative rewards) by suspending the task before the negative result manifests due to adversarial perturbation attacks. Our experiments found that this detection method works best for Pong with perfect precision and recall against all adversarial perturbation attacks but is less robust for Breakout and Freeway. The second security concern is theft and unauthorized distribution of DRL agents. As DRL agents gain success in complex tasks, there is a growing interest to monetize them. However, the possibility of theft could jeopardize the profitability of deploying these agents. Robust ownership verification techniques can deter malicious parties from stealing these agents, and in the event where theft cannot be prevented, ownership verification techniques can be used to track down and prosecute perpetrators. There are two prior works on ownership verification of DRL agents using watermarks. However, these two techniques require the verifier to deploy the suspected stolen agent in an environment where the verifier has complete control over the environment states. We propose a new fingerprint technique where the verifier compares the percentage of action agreement between the suspect agent and the owner's agent in environments where UAPs are applied. Our experimental results show that there is a significant difference in the percentage of action agreement (up to $50\%$ in some cases) when the suspect agent is a copy of the owner's agent versus when the suspect agent is an independently trained agent.

URI

http://hdl.handle.net/10012/18443

Collections

Theses
Computer Science

Full item page

Security and Ownership Verification in Deep Reinforcement Learning

Files

Date

Authors

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

LC Subject Headings

Citation

URI

Collections