Image Collection Pop-up: 3D Reconstruction and Clustering
of Rigid and Non-Rigid Categories
Abstract
This paper introduces an approach to simultaneously
estimate 3D shape, camera pose, and object and type of
deformation clustering, from partial 2D annotations in a
multi-instance collection of images. Furthermore, we can
indistinctly process rigid and non-rigid categories. This advances existing work, which only addresses the problem for
one single object or, if multiple objects are considered, they
are assumed to be clustered a priori. To handle this broader
version of the problem, we model object deformation using a formulation based on multiple unions of subspaces,
able to span from small rigid motion to complex deformations. The parameters of this model are learned via Augmented Lagrange Multipliers, in a completely unsupervised
manner that does not require any training data at all. Extensive validation is provided in a wide variety of synthetic
and real scenarios, including rigid and non-rigid categories
with small and large deformations. In all cases our approach outperforms state-of-the-art in terms of 3D reconstruction accuracy, while also providing clustering results
that allow segmenting the images into object instances and
their associated type of deformation (or action the object is
performing)