Abstract
Rich semantic relations are important in a variety of vi-sual recognition problems. As a concrete example, group activity recognition involves the interactions and relativespatial relations of a set of people in a scene. State of the a recognition methods center on deep learning approachesfor training highly effective, complex classifiers for inter-preting images. However, bridging the relatively low-level concepts output by these methods to interpret higher-levelcompositional scenes remains a challenge. Graphical models are a standard tool for this task. In this paper, we propose a method to integrate graphical models and deep neural networks into a joint framework. Instead of using a traditional inference method, we use a sequential inferencemodeled by a recurrent neural network. Beyond this, theappropriate structure for inference can be learned by imposing gates on edges between nodes. Empirical results on group activity recognition demonstrate the potential of this model to handle highly structured learning tasks.