Abstract
Understanding what a person is experiencing from her
frame of reference is essential in our everyday life. For this
reason, one can think that machines with this type of ability
would interact better with people. However, there are no
current systems capable of understanding in detail people’s
emotional states. Previous research on computer vision to
recognize emotions has mainly focused on analyzing the facial expression, usually classifying it into the 6 basic emotions [11]. However, the context plays an important role in
emotion perception, and when the context is incorporated,
we can infer more emotional states. In this paper we present
the “Emotions in Context Database” (EMOTIC), a dataset
of images containing people in context in non-controlled
environments. In these images, people are annotated with
26 emotional categories and also with the continuous dimensions valence, arousal, and dominance [21]. With the
EMOTIC dataset, we trained a Convolutional Neural Network model that jointly analyses the person and the whole
scene to recognize rich information about emotional states.
With this, we show the importance of considering the context for recognizing people’s emotions in images, and provide a benchmark in the task of emotion recognition in visual context.