Abstract
Despite the fact that object detection, 3D pose estimation, and sub-category recognition are highly correlated tasks, they are usually addressed independently from eachother because of the huge space of parameters. To jointlymodel all of these tasks, we propose a coarse-to-fine hierarchical representation, where each level of the hierarchyrepresents objects at a different level of granularity. The hi-erarchical representation prevents performance loss, which is often caused by the increase in the number of parameters (as we consider more tasks to model), and the joint modeling enables resolving ambiguities that exist in independent modeling of these tasks. We augment PASCAL3D+ [34] dataset with annotations for these tasks and show that our hierarchical model is effective in joint modeling of object detection, 3D pose estimation, and sub-category recognition.