☆ 4.5 Article

Con-Detect: Detecting adversarially perturbed natural language inputs to deep classifiers through holistic analysis

COMPUTERS & SECURITY (2023)

期刊

COMPUTERS & SECURITY

卷 132, 期 -, 页码 -

出版社

ELSEVIER ADVANCED TECHNOLOGY

DOI: 10.1016/j.cose.2023.103367

关键词

Machine learning security; Adversarial detection; Adversarial machine learning; Secure natural language processing; Adversarial signatures

类别

Computer Science, Information Systems

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Deep learning algorithms have shown great performance in various NLP tasks, but they are susceptible to adversarial attacks. This paper proposes an unsupervised detection method for identifying adversarial inputs to NLP classifiers. Experimental results demonstrate that the proposed method can significantly reduce the success rate of different attacks.

Deep Learning (DL) algorithms have shown wonders in many Natural Language Processing (NLP) tasks such as language-to-language translation, spam filtering, fake-news detection, and comprehension understanding. However, research has shown that the adversarial vulnerabilities of deep learning networks manifest themselves when DL is used for NLP tasks. Most mitigation techniques proposed to date are supervised-relying on adversarial retraining to improve the robustness-which is impractical. This work introduces a novel, unsupervised detection methodology for detecting adversarial inputs to NLP classifiers. In summary, we note that minimally perturbing an input to change a model's output-a major strength of adversarial attacks-is a weakness that leaves unique statistical marks reflected in the cumulative contribution scores of the input. Particularly, we show that the cumulative contribution score , called C F -score, of adversarial inputs is generally greater than that of the clean inputs. We thus propose Con-Detect -a Con tribution based Detect ion method-for detecting adversarial attacks against NLP classifiers. Con-Detect can be deployed with any classifier without having to retrain it. We experiment with multiple attackers-Text-bugger, Text-fooler, PWWS-on several architectures-MLP, CNN, LSTM, Hybrid CNN-RNN, BERT-trained for different classification tasks-IMDB sentiment classification, fake-news classification, AG news topic classification-under different threat models-Con-Detect -blind attacks, ConDetect -aware attacks, and Con-Detect -adaptive attacks-and show that Con-Detect can reduce the attack success rate (ASR) of different attacks from 100% to as low as 0% for the best cases and & AP;70% for the worst case. Even in the worst case, we note a 100% increase in the required number of queries and a 50% increase in the number of words perturbed, suggesting that Con-Detect is hard to evade.& COPY; 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )

Con-Detect: Detecting adversarially perturbed natural language inputs to deep classifiers through holistic analysis

期刊

COMPUTERS & SECURITY

出版社

ELSEVIER ADVANCED TECHNOLOGY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Con-Detect: Detecting adversarially perturbed natural language inputs to deep classifiers through holistic analysis

期刊

COMPUTERS & SECURITY

出版社

ELSEVIER ADVANCED TECHNOLOGY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文