Abstract
The ability to estimate the perceptual error between images is an important problem in computer vision with many
applications. Although it has been studied extensively, however, no method currently exists that can robustly predict
visual differences like humans. Some previous approaches
used hand-coded models, but they fail to model the complexity of the human visual system. Others used machine learning to train models on human-labeled datasets, but creating
large, high-quality datasets is difficult because people are
unable to assign consistent error labels to distorted images.
In this paper, we present a new learning-based method that
is the first to predict perceptual image error like human observers. Since it is much easier for people to compare two
given images and identify the one more similar to a reference than to assign quality scores to each, we propose a
new, large-scale dataset labeled with the probability that
humans will prefer one image over another. We then train
a deep-learning model using a novel, pairwise-learning
framework to predict the preference of one distorted image over the other. Our key observation is that our trained
network can then be used separately with only one distorted
image and a reference to predict its perceptual error, without ever being trained on explicit human perceptual-error
labels. The perceptual error estimated by our new metric,
PieAPP, is well-correlated with human opinion. Furthermore, it significantly outperforms existing algorithms, beating the state-of-the-art by almost 3× on our test set in terms
of binary error rate, while also generalizing to new kinds of
distortions, unlike previous learning-based methods