Journal
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS
Volume 27, Issue 7, Pages 3289-3304Publisher
IEEE COMPUTER SOC
DOI: 10.1109/TVCG.2020.2969185
Keywords
Neurons; Visualization; Data visualization; Feature extraction; Training; Merging; Biological neural networks; Robustness; deep neural networks; adversarial examples; explainable machine learning
Categories
Funding
- National Key R&D Program of China [2018YFB1004300, 2017YFA0700904]
- National Natural Science Foundation of China [61936002, 61761136020, 61672308]
- Institute Guo Qiang, Tsinghua University
- NSFC [61620106010, 61621136008]
- Beijing NSF Project [L172037]
- Beijing Academy of Artificial Intelligence (BAAI)
- JP Morgan Faculty Research Program
Ask authors/readers for more resources
This research investigates how adversarial examples can mislead deep neural networks and proposes a visual analysis method to explain misclassification of adversarial examples. By comparing and analyzing the datapaths of adversarial and normal examples, a multi-level visualization approach is designed to showcase the potential of the method in explaining misclassification of adversarial examples.
Adversarial examples, generated by adding small but intentionally imperceptible perturbations to normal examples, can mislead deep neural networks (DNNs) to make incorrect predictions. Although much work has been done on both adversarial attack and defense, a fine-grained understanding of adversarial examples is still lacking. To address this issue, we present a visual analysis method to explain why adversarial examples are misclassified. The key is to compare and analyze the datapaths of both the adversarial and normal examples. A datapath is a group of critical neurons along with their connections. We formulate the datapath extraction as a subset selection problem and solve it by constructing and training a neural network. A multi-level visualization consisting of a network-level visualization of data flows, a layer-level visualization of feature maps, and a neuron-level visualization of learned features, has been designed to help investigate how datapaths of adversarial and normal examples diverge and merge in the prediction process. A quantitative evaluation and a case study were conducted to demonstrate the promise of our method to explain the misclassification of adversarial examples.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available