Abstract
To understand a scene in depth not only involves locating/recognizing individual objects, but also requires to infer the relationships and interactions among them. However, since the distribution of real-world relationships is seriously unbalanced, existing methods perform quite poorly
for the less frequent relationships. In this work, we find that
the statistical correlations between object pairs and their
relationships can effectively regularize semantic space and
make prediction less ambiguous, and thus well address the
unbalanced distribution issue. To achieve this, we incorporate these statistical correlations into deep neural networks to facilitate scene graph generation by developing a
Knowledge-Embedded Routing Network. More specifically,
we show that the statistical correlations between objects appearing in images and their relationships, can be explicitly
represented by a structured knowledge graph, and a routing
mechanism is learned to propagate messages through the
graph to explore their interactions. Extensive experiments
on the large-scale Visual Genome dataset demonstrate the
superiority of the proposed method over current state-ofthe-art competitors.