Abstract
Deep neural networks (DNNs) have recently beenachieving state-of-the-art performance on a variety ofpattern-recognition tasks, most notably visual classificationproblems. Given that DNNs are now able to classify objectsin images with near-human-level performance, questionsnaturally arise as to what differences remain between com-puter and human vision. A recent study [30] revealed thatchanging an image (e.g. of a lion) in a way imperceptible tohumans can cause a DNN to label the image as somethingelse entirely (e.g. mislabeling a lion a library). Here weshow a related result: it is easy to produce images that arecompletely unrecognizable to humans, but that state-of-the-art DNNs believe to be recognizable objects with 99.99%confidence (e.g. labeling with certainty that white noisestatic is a lion). Specifically, we take convolutional neu-ral networks trained to perform well on either the ImageNetor MNIST datasets and then find images with evolutionaryalgorithms or gradient ascent that DNNs label with highconfidence as belonging to each dataset class. It is possi-ble to produce images totally unrecognizable to human eyesthat DNNs believe with near certainty are familiar objects,which we call “fooling images” (more generally, fooling examples). Our results shed light on interesting differences between human vision and current DNNs, and raise questions about the generality of DNN computer vision.