Abstract
In this paper we propose a framework for spatially and
temporally coherent semantic co-segmentation and reconstruction of complex dynamic scenes from multiple static
or moving cameras. Semantic co-segmentation exploits the
coherence in semantic class labels both spatially, between
views at a single time instant, and temporally, between
widely spaced time instants of dynamic objects with similar shape and appearance. We demonstrate that semantic
coherence results in improved segmentation and reconstruction for complex scenes. A joint formulation is proposed for
semantically coherent object-based co-segmentation and
reconstruction of scenes by enforcing consistent semantic
labelling between views and over time. Semantic tracklets are introduced to enforce temporal coherence in semantic labelling and reconstruction between widely spaced instances of dynamic objects. Tracklets of dynamic objects
enable unsupervised learning of appearance and shape priors that are exploited in joint segmentation and reconstruction. Evaluation on challenging indoor and outdoor sequences with hand-held moving cameras shows improved
accuracy in segmentation, temporally coherent semantic labelling and 3D reconstruction of dynamic scenes