Abstract Automatic assessment of sentiment from visual content has gained considerable attention with the increasing tendency of expressing opinions on-line. In this paper, we solve the problem of visual sentiment analysis using the high-level abstraction in the recognition process. Existing methods based on convolutional neural networks learn sentiment representations from the holistic image appearance. However, different image regions can have a different inflfluence on the intended expression. This paper presents a weakly supervised coupled convolutional network with two branches to leverage the localized information. The fifirst branch detects a sentiment specifific soft map by training a fully convolutional network with the cross spatial pooling strategy, which only requires image-level labels, thereby signifificantly reducing the annotation burden. The second branch utilizes both the holistic and localized information by coupling the sentiment map with deep features for robust classifification. We integrate the sentiment detection and classi- fification branches into a unifified deep framework and optimize the network in an end-to-end manner. Extensive experiments on six benchmark datasets demonstrate that the proposed method performs favorably against the state-ofthe-art methods for visual sentiment analysis