Abstract
Vulnerability of Deep Neural Networks (DNNs) to adversarial attacks has been attracting a lot of attention in
recent studies. It has been shown that for many state of the
art DNNs performing image classification there exist universal adversarial perturbations — image-agnostic perturbations mere addition of which to natural images with high
probability leads to their misclassification. In this work we
propose a new algorithm for constructing such universal
perturbations. Our approach is based on computing the socalled (p, q)-singular vectors of the Jacobian matrices of
hidden layers of a network. Resulting perturbations present
interesting visual patterns, and by using only 64 images we
were able to construct universal perturbations with more
than 60 % fooling rate on the dataset consisting of 50000
images. We also investigate a correlation between the maximal singular value of the Jacobian matrix and the fooling
rate of the corresponding singular vector, and show that the
constructed perturbations generalize across networks