Abstract. Estimating the 6D pose of objects from images is an important problem in various applications such as robot manipulation and
virtual reality. While direct regression of images to object poses has limited accuracy, matching rendered images of an object against the input
image can produce accurate results. In this work, we propose a novel deep
neural network for 6D pose matching named DeepIM. Given an initial
pose estimation, our network is able to iteratively refine the pose by
matching the rendered image against the observed image. The network
is trained to predict a relative pose transformation using an untangled
representation of 3D location and 3D orientation and an iterative training process. Experiments on two commonly used benchmarks for 6D pose
estimation demonstrate that DeepIM achieves large improvements over
state-of-the-art methods. We furthermore show that DeepIM is able to
match previously unseen objects.