Evading Defenses to Transferable Adversarial Examples byTranslation-Invariant Attacks
Abstract
Deep neural networks are vulnerable to adversarial examples, which can mislead classifiers by adding imperceptible perturbations. An intriguing property of adversarial
examples is their good transferability, making black-box attacks feasible in real-world applications. Due to the threat
of adversarial attacks, many methods have been proposed
to improve the robustness. Several state-of-the-art defenses
are shown to be robust against transferable adversarial examples. In this paper, we propose a translation-invariant
attack method to generate more transferable adversarial examples against the defense models. By optimizing a perturbation over an ensemble of translated images, the generated
adversarial example is less sensitive to the white-box model
being attacked and has better transferability. To improve
the efficiency of attacks, we further show that our method
can be implemented by convolving the gradient at the untranslated image with a pre-defined kernel. Our method is
generally applicable to any gradient-based attack method.
Extensive experiments on the ImageNet dataset validate the
effectiveness of the proposed method. Our best attack fools
eight state-of-the-art defenses at an 82% success rate on average based only on the transferability, demonstrating the
insecurity of the current defense techniques.