Abstract
Object detection has been a long standing problem incomputer vision, and state-of-the-art approaches rely onthe use of sophisticated features and/or classifiers. How-ever, these learning-based approaches heavily depend onthe quality and quantity of labeled data, and do not gener-alize well to extreme poses or textureless objects. In this work, we explore the use of 3D shape models todetect objects in videos in an unsupervised manner. We callthis problem Motion from Structure (MfS): given a set ofpoint trajectories and a 3D model of the object of interest,find a subset of trajectories that correspond to the 3D modeland estimate its alignment (i.e., compute the motion matrix).MfS is related to Structure from Motion (SfM) and motionsegmentation problems: unlike SfM, the structure of the object is known but the correspondence between the trajectories and the object is unknown; unlike motion segmentation, the MfS problem incorporates 3D structure, providing robustness to tracking mismatches and outliers. Experiments illustrate how our MfS algorithm outperforms alternative approaches in both synthetic data and real videos extracted from YouTube.