Stereo Vision-based Semantic 3D Object and
Ego-motion Tracking for Autonomous Driving
Abstract. We propose a stereo vision-based approach for tracking the
camera ego-motion and 3D semantic objects in dynamic autonomous
driving scenarios. Instead of directly regressing the 3D bounding box using end-to-end approaches, we propose to use the easy-to-labeled 2D detection and discrete viewpoint classification together with a light-weight
semantic inference method to obtain rough 3D object measurements.
Based on the object-aware-aided camera pose tracking which is robust
in dynamic environments, in combination with our novel dynamic object
bundle adjustment (BA) approach to fuse temporal sparse feature correspondences and the semantic 3D measurement model, we obtain 3D
object pose, velocity and anchored dynamic point cloud estimation with
instance accuracy and temporal consistency. The performance of our proposed method is demonstrated in diverse scenarios. Both the ego-motion
estimation and object localization are compared with the state-of-of-theart solutions