Practical Black-box Attacks on Deep Neural
Networks using Efficient Query Mechanisms
Abstract. Existing black-box attacks on deep neural networks (DNNs)
have largely focused on transferability, where an adversarial instance generated for a locally trained model can “transfer” to attack other learning
models. In this paper, we propose novel Gradient Estimation black-box
attacks for adversaries with query access to the target model’s class probabilities, which do not rely on transferability. We also propose strategies
to decouple the number of queries required to generate each adversarial sample from the dimensionality of the input. An iterative variant of
our attack achieves close to 100% attack success rates for both targeted
and untargeted attacks on DNNs. We carry out a thorough comparative
evaluation of black-box attacks and show that Gradient Estimation attacks achieve attack success rates similar to state-of-the-art white-box
attacks on the MNIST and CIFAR-10 datasets. We also apply the Gradient Estimation attacks successfully against real-world classifiers hosted
by Clarifai. Further, we evaluate black-box attacks against state-of-theart defenses based on adversarial training and show that the Gradient
Estimation attacks are very effective even against these defenses