☆ 4.8 Article

Privacy Preserving Defense For Black Box Classifiers Against On-Line Adversarial Attacks

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2022)

Journal

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Volume 44, Issue 12, Pages 9503-9520

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TPAMI.2021.3125931

Keywords

Training; Perturbation methods; Bayes methods; Uncertainty; Deep learning; Privacy; Data models; Adversarial defense; Bayesian uncertainties; black box defense; ensemble of defenses; image purifiers; knowledge distillation; privacy preserving defense

Funding

NSF [1911197]
Bourns endowment funds
Direct For Computer & Info Scie & Enginr [1911197] Funding Source: National Science Foundation
Div Of Information & Intelligent Systems [1911197] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a novel defense strategy using an ensemble of iterative adversarial image purifiers to protect black box classifiers from adversarial attacks, and validates their performance using Bayesian uncertainties. The approach consistently detects adversarial examples and purifies or rejects them.

Deep learning models have been shown to be vulnerable to adversarial attacks. Adversarial attacks are imperceptible perturbations added to an image such that the deep learning model misclassifies the image with a high confidence. Existing adversarial defenses validate their performance using only the classification accuracy. However, classification accuracy by itself is not a reliable metric to determine if the resulting image is adversarial-free. This is a foundational problem for online image recognition applications where the ground-truth of the incoming image is not known and hence we cannot compute the accuracy of the classifier or validate if the image is adversarial-free or not. This paper proposes a novel privacy preserving framework for defending Black box classifiers from adversarial attacks using an ensemble of iterative adversarial image purifiers whose performance is continuously validated in a loop using Bayesian uncertainties. The proposed approach can convert a single-step black box adversarial defense into an iterative defense and proposes three novel privacy preserving Knowledge Distillation (KD) approaches that use prior meta-information from various datasets to mimic the performance of the Black box classifier. Additionally, this paper proves the existence of an optimal distribution for the purified images that can reach a theoretical lower bound, beyond which the image can no longer be purified. Experimental results on six public benchmark datasets namely: 1) Fashion-MNIST, 2) CIFAR-10, 3) GTSRB, 4) MIO-TCD, 5) Tiny-ImageNet, and 6) MS-Celeb show that the proposed approach can consistently detect adversarial examples and purify or reject them against a variety of adversarial attacks.

Privacy Preserving Defense For Black Box Classifiers Against On-Line Adversarial Attacks

Journal

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Privacy Preserving Defense For Black Box Classifiers Against On-Line Adversarial Attacks

Journal

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper