4.5 Article

Minimalistic Attacks: How Little It Takes to Fool Deep Reinforcement Learning Policies

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCDS.2020.2974509

关键词

Optimization; Neural networks; Games; Perturbation methods; Learning (artificial intelligence); Electronic mail; Analytical models; Adversarial attack; reinforcement learning (RL)

资金

  1. National Research Foundation, Singapore under its AI Singapore Programme [AISG-RP-2018-004]
  2. Data Science and Artificial Intelligence Research Center at Nanyang Technological University

向作者/读者索取更多资源

Recent studies show that neural-network-based policies can be easily fooled by adversarial examples. This article explores the limits of a model's vulnerability by defining three key settings for minimalistic attacks and testing their potency on six Atari games. The findings reveal significant performance degradation and deception of state-of-the-art policies by minimal perturbations.
Recent studies have revealed that neural-network-based policies can be easily fooled by adversarial examples. However, while most prior works analyze the effects of perturbing every pixel of every frame assuming white-box policy access, in this article, we take a more restrictive view toward adversary generation-with the goal of unveiling the limits of a model's vulnerability. In particular, we explore minimalistic attacks by defining three key settings: 1) Black-Box Policy Access: where the attacker only has access to the input (state) and output (action probability) of an RL policy; 2) Fractional-State Adversary: where only several pixels are perturbed, with the extreme case being a single-pixel adversary; and 3) Tactically Chanced Attack: where only significant frames are tactically chosen to be attacked. We formulate the adversarial attack by accommodating the three key settings, and explore their potency on six Atari games by examining four fully trained state-of-the-art policies. In Breakout, for example, we surprisingly find that: 1) all policies showcase significant performance degradation by merely modifying 0.01% of the input state and 2) the policy trained by DQN is totally deceived by perturbing only 1% frames.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据