Abstract
The prediction of salient areas in images has been tra-ditionally addressed with hand-crafted features based onneuroscience principles. This paper, however, addresses theproblem with a completely data-driven approach by traininga convolutional neural network (convnet). The learning pro-cess is formulated as a minimization of a loss function thatmeasures the Euclidean distance of the predicted saliencymap with the provided ground truth. The recent publicationof large datasets of saliency prediction has provided enoughdata to train end-to-end architectures that are both fast andaccurate. Two designs are proposed: a shallow convnettrained from scratch, and a another deeper solution whosefirst three layers are adapted from another network trainedfor classification. To the authors’ knowledge, these are thefirst end-to-end CNNs trained and tested for the purpose ofsaliency prediction.