Abstract
Given a state-of-the-art deep neural network classifier,
we show the existence of a universal (image-agnostic) and
very small perturbation vector that causes natural images
to be misclassified with high probability. We propose a systematic algorithm for computing universal perturbations,
and show that state-of-the-art deep neural networks are
highly vulnerable to such perturbations, albeit being quasiimperceptible to the human eye. We further empirically analyze these universal perturbations and show, in particular,
that they generalize very well across neural networks. The
surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional
decision boundary of classifiers. It further outlines potential security breaches with the existence of single directions
in the input space that adversaries can possibly exploit to
break a classifier on most natural images