Abstract. We introduce a novel method for robust and accurate 3D
object pose estimation from a single color image under large occlusions.
Following recent approaches, we first predict the 2D projections of 3D
points related to the target object and then compute the 3D pose from
these correspondences using a geometric method. Unfortunately, as the
results of our experiments show, predicting these 2D projections using
a regular CNN or a Convolutional Pose Machine is highly sensitive to
partial occlusions, even when these methods are trained with partially
occluded examples. Our solution is to predict heatmaps from multiple
small patches independently and to accumulate the results to obtain
accurate and robust predictions. Training subsequently becomes challenging because patches with similar appearances but different positions
on the object correspond to different heatmaps. However, we provide a
simple yet effective solution to deal with such ambiguities. We show that
our approach outperforms existing methods on two challenging datasets:
The Occluded LineMOD dataset and the YCB-Video dataset, both exhibiting cluttered scenes with highly occluded objects