Abstract
We address the problem of learning a pose-aware, compact embedding that projects images with similar humanposes to be placed close-by in the embedding space. Theembedding function is built on a deep convolutional net-work, and trained with triplet-based rank constraints onreal image data. This architecture allows us to learn arobust representation that captures differences in humanposes by effectively factoring out variations in clothing,background, and imaging conditions in the wild. For a variety of pose-related tasks, the proposed pose embed-ding provides a cost-efficient and natural alternative to ex-plicit pose estimation, circumventing challenges of localiz-ing body joints. We demonstrate the efficacy of the embed-ding on pose-based image retrieval and action recognition problems.