Abstract. Compositional models represent patterns with hierarchies of
meaningful parts and subparts. Their ability to characterize high-order
relationships among body parts helps resolve low-level ambiguities in human pose estimation (HPE). However, prior compositional models make
unrealistic assumptions on subpart-part relationships, making them incapable to characterize complex compositional patterns. Moreover, state
spaces of their higher-level parts can be exponentially large, complicating both inference and learning. To address these issues, this paper introduces a novel framework, termed as Deeply Learned Compositional
Model (DLCM), for HPE. It exploits deep neural networks to learn the
compositionality of human bodies. This results in a novel network with
a hierarchical compositional architecture and bottom-up/top-down inference stages. In addition, we propose a novel bone-based part representation. It not only compactly encodes orientations, scales and shapes
of parts, but also avoids their potentially large state spaces. With significantly lower complexities, our approach outperforms state-of-the-art
methods on three benchmark datasets