3.8 Proceedings Paper

Attack as Defense: Characterizing Adversarial Examples using Robustness

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3460319.3464822

关键词

Deep learning; neural networks; defense; adversarial examples

资金

  1. National Natural Science Foundation of China (NSFC) [62072309]
  2. Singapore MOE Tier 1 Project [19-C220-SMU-002]
  3. Guangdong Science and Technology Department [2018B010107004]

向作者/读者索取更多资源

Deep learning has been widely applied to real-world problems, but software based on deep learning is vulnerable to adversarial attacks. This work introduces a new characterization and defense framework to distinguish adversarial examples based on robustness evaluation. A(2)D is shown to be more effective than recent approaches in detecting and defending against adversarial attacks.
As a new programming paradigm, deep learning has expanded its application to many real-world problems. At the same time, deep learning based software are found to be vulnerable to adversarial attacks. Though various defense mechanisms have been proposed to improve robustness of deep learning software, many of them are ineffective against adaptive attacks. In this work, we propose a novel characterization to distinguish adversarial examples from benign ones based on the observation that adversarial examples are significantly less robust than benign ones. As existing robustness measurement does not scale to large networks, we propose a novel defense framework, named attack as defense (A(2)D), to detect adversarial examples by effectively evaluating an example's robustness. A(2)D uses the cost of attacking an input for robustness evaluation and identifies those less robust examples as adversarial since less robust examples are easier to attack. Extensive experiment results on MNIST, CIFAR10 and ImageNet show that A(2)D is more effective than recent promising approaches. We also evaluate our defense against potential adaptive attacks and show that A(2)D is effective in defending carefully designed adaptive attacks, e.g., the attack success rate drops to 0% on CIFAR10.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据