4.5 Article

Minimalistic Attacks: How Little It Takes to Fool Deep Reinforcement Learning Policies

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCDS.2020.2974509

Keywords

Optimization; Neural networks; Games; Perturbation methods; Learning (artificial intelligence); Electronic mail; Analytical models; Adversarial attack; reinforcement learning (RL)

Funding

  1. National Research Foundation, Singapore under its AI Singapore Programme [AISG-RP-2018-004]
  2. Data Science and Artificial Intelligence Research Center at Nanyang Technological University

Ask authors/readers for more resources

Recent studies show that neural-network-based policies can be easily fooled by adversarial examples. This article explores the limits of a model's vulnerability by defining three key settings for minimalistic attacks and testing their potency on six Atari games. The findings reveal significant performance degradation and deception of state-of-the-art policies by minimal perturbations.
Recent studies have revealed that neural-network-based policies can be easily fooled by adversarial examples. However, while most prior works analyze the effects of perturbing every pixel of every frame assuming white-box policy access, in this article, we take a more restrictive view toward adversary generation-with the goal of unveiling the limits of a model's vulnerability. In particular, we explore minimalistic attacks by defining three key settings: 1) Black-Box Policy Access: where the attacker only has access to the input (state) and output (action probability) of an RL policy; 2) Fractional-State Adversary: where only several pixels are perturbed, with the extreme case being a single-pixel adversary; and 3) Tactically Chanced Attack: where only significant frames are tactically chosen to be attacked. We formulate the adversarial attack by accommodating the three key settings, and explore their potency on six Atari games by examining four fully trained state-of-the-art policies. In Breakout, for example, we surprisingly find that: 1) all policies showcase significant performance degradation by merely modifying 0.01% of the input state and 2) the policy trained by DQN is totally deceived by perturbing only 1% frames.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available