Abstract. We consider the problem of inferring a layered representation, its depth ordering and motion segmentation from video in which
objects may undergo 3D non-planar motion relative to the camera. We
generalize layered inference to that case and corresponding self-occlusion
phenomena. We accomplish this by introducing a flattened 3D object representation, which is a compact representation of an object that contains
all visible portions of the object seen in the video, including parts of an
object that are self-occluded (as well as occluded) in one frame but seen
in another. We formulate the inference of such flattened representations
and motion segmentation, and derive an optimization scheme. We also
introduce a new depth ordering scheme, which is independent of layered inference and addresses the case of self-occlusion. It requires little
computation given the flattened representations. Experiments on benchmark datasets show the advantage of our method over existing layered
methods, which do not model 3D motion and self-occlusion