Abstract Research in visual saliency has been focused on two major types of models namely fifixation prediction and salient object detection. The relationship between the two, however, has been less explored. In this paper, we propose to employ the former model type to identify and segment salient objects in scenes. We build a novel neural network called Attentive Saliency Network (ASNet)1 that learns to detect salient objects from fifixation maps. The fifixation map, derived at the upper network layers, captures a high-level understanding of the scene. Salient object detection is then viewed as fifine-grained object-level saliency segmentation and is progressively optimized with the guidance of the fifixation map in a top-down manner. ASNet is based on a hierarchy of convolutional LSTMs (convLSTMs) that offers an effificient recurrent mechanism for sequential refifinement of the segmentation map. Several loss functions are introduced for boosting the performance of the ASNet. Extensive experimental evaluation shows that our proposed ASNet is capable of generating accurate segmentation maps with the help of the computed fifixation map. Our work offers a deeper insight into the mechanisms of attention and narrows the gap between salient object detection and fifixation prediction