Abstract
The goal of this paper is to analyze the geometric properties of deep neural network image classifiers in the input
space. We specifically study the topology of classification
regions created by deep networks, as well as their associated
decision boundary. Through a systematic empirical study, we
show that state-of-the-art deep nets learn connected classifi-
cation regions, and that the decision boundary in the vicinity
of datapoints is flat along most directions. We further draw
an essential connection between two seemingly unrelated
properties of deep networks: their sensitivity to additive perturbations of the inputs, and the curvature of their decision
boundary. The directions where the decision boundary is
curved in fact characterize the directions to which the classi-
fier is the most vulnerable. We finally leverage a fundamental
asymmetry in the curvature of the decision boundary of deep
nets, and propose a method to discriminate between original images, and images perturbed with small adversarial
examples. We show the effectiveness of this purely geometric
approach for detecting small adversarial perturbations in
images, and for recovering the labels of perturbed images