4.6 Article

Self-adaptive logit balancing for deep neural network robustness: Defence and detection of adversarial attacks

Journal

NEUROCOMPUTING
Volume 531, Issue -, Pages 180-194

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2023.02.013

Keywords

Machine learning security; Adversarial examples; Adversarial robustness; Adversarial attacks detection; Deep neural networks

Ask authors/readers for more resources

This paper proposes a novel defence method to improve the adversarial robustness of DNN classifiers without using adversarial training. This method introduces two new loss functions to punish overconfidence and protect the network from non-targeted attacks. It also presents a new robustness diagram to analyze and visualize the network's robustness against adversarial attacks and a Log-Softmax-pattern-based adversarial attack detection method.
With the widespread applications of Deep Neural Networks (DNNs), the safety of DNNs has become a sig-nificant issue. The vulnerability of the neural networks against adversarial examples deepens concerns about the safety of DNNs applications. This paper proposed a novel defence method to improve the adver-sarial robustness of DNN classifiers without using adversarial training. This method introduces two new loss functions. First, a zero-cross-entropy loss is used to punish overconfidence and find the appropriate confidence for different instances. Second, a logit balancing loss is proposed to protect DNNs from non-targeted attacks by regularising incorrect classes' logits distribution. This method achieved competitive adversarial robustness compared to advanced adversarial training methods. Meanwhile, a novel robust-ness diagram is proposed to analyse, interpret and visualise the robustness of DNN classifiers against adversarial attacks. Furthermore, a Log-Softmax-pattern-based adversarial attack detection method is proposed. This detection method can distinguish clean inputs and multiple adversarial attacks via one multi-classification MLP. In particular, it is state-of-the-art in identifying white-box gradient-based attacks; it achieved at least 95.5% accuracy for classifying four white-box gradient-based attacks with maximum 0.1% false positive ratio. (c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available