HCR-Net: A Hybrid of Classification and Regression Network for Object Pose Estimation
Abstract
Object pose estimation from a single image is a fundamental and challenging problem in computer vision and robotics. Generally, current methods treat pose estimation as a classification or a regression problem. However, regression methods usually suffer from the issue of imbalanced training data, while classification methods are difficult to discriminate nearby poses. In this paper, a hybrid CNN model, which we call it HCR-Net that integrates both a classification network and a regression network, is proposed to deal with these issues. Our model is inspired by that regression methods can get better accuracy on homogeneously distributed datasets while classification methods are more effective for coarse quantization of the poses even if the dataset is not well balanced. The classification methods and the regression methods essentially complement each other. Thus we integrate both them into a neural network in a hybrid fashion and train it end-to-end with two novel loss functions. As a result, our method surpasses the state-of-the-art methods, even with imbalanced training data and much less data augmentation. The experimental results on the challenging Pascal3D+ database demonstrate that our method outperforms the state-of-the-arts significantly, achieving improvements on ACC ?6 and AV P metrics up to 4% and 6%, respectively.