Abstract
Human eye movements provide a rich source of information into the human visual information processing. The complex interplay between the task and the visual stimulus is believed to determine human eye movements, yet it is not fully understood, making it difficult to develop reliable eye movement prediction systems. Our work makes three contributions towards addressing this problem. First, we complement one of the largest and most challenging static computer vision datasets, VOC 2012 Actions, with human eye movement recordings collected under the primary task constraint of action recognition, as well as, separately, for context recognition, in order to analyze the impact of different tasks. Our dataset is unique among the eyetracking datasets of still images in terms of large scale (over 1 million fixations recorded in 9157 images) and different task controls. Second, we propose Markov models to automatically discover areas of interest (AOI) and introduce novel sequential consistency metrics based on them. Our methods can automatically determine the number, the spatial support and the transitions between AOIs, in addition to their locations. Based on such encodings, we quantitatively show that given unconstrained read-world stimuli, task instructions have significant influence on the human visual search patterns and are stable across subjects. Finally, we leverage powerful machine learning techniques and computer vision features in order to learn task-sensitive reward functions from eye movement data within models that allow to effectively predict the human visual search patterns based on inverse optimal control. The methodology achieves state of the art scanpath modeling results.