Abstract. We present a challenging and realistic novel dataset for evaluating 6-DOF object tracking algorithms. Existing datasets show serious
limitations—notably, unrealistic synthetic data, or real data with large
fiducial markers—preventing the community from obtaining an accurate
picture of the state-of-the-art. Using a data acquisition pipeline based
on a commercial motion capture system for acquiring accurate ground
truth poses of real objects with respect to a Kinect V2 camera, we build
a dataset which contains a total of 297 calibrated sequences. They are acquired in three different scenarios to evaluate the performance of trackers:
stability, robustness to occlusion and accuracy during challenging interactions between a person and the object. We conduct an extensive study
of a deep 6-DOF tracking architecture and determine a set of optimal
parameters. We enhance the architecture and the training methodology
to train a 6-DOF tracker that can robustly generalize to objects never
seen during training, and demonstrate favorable performance compared
to previous approaches trained specifically on the objects to track