Abstract
In this paper, we present an unsupervised framework for dis- covering, detecting, tracking, and reconstructing dense ob jects from a video sequence. The system simultaneously localizes a moving camera, and discovers a set of shape and appearance models for multiple ob jects, including the scene background. Each ob ject model is represented by both a 2D and 3D level-set. This representation is used to improve detec- tion, 2D-tracking, 3D-registration and importantly subsequent updates to the level-set itself. This single framework performs dense simultane- ous localization and mapping as well as unsupervised ob ject discovery. At each iteration portions of the scene that fail to track, such as bulk outliers on moving rigid bodies, are used to either seed models for new ob jects or to update models of known ob jects. For the latter, once an ob ject is successfully tracked in 2D with aid from a 2D level-set segmen- tation, the level-set is updated and then used to aid registration and evolution of a 3D level-set that captures shape information. For a known ob ject either learned by our system or introduced from a third-party li- brary, our framework can detect similar appearances and geometries in the scene. The system is tested using single and multiple ob ject data sets. Results demonstrate an improved method for discovering and re- constructing 2D and 3D ob ject models, which aid tracking even under significant occlusion or rapid motion.