Abstract
The main limitation of previous approaches to unsupervised sequential objectoriented representation learning is scalability. Most of the previous models have been shown to work only on scenes with a few objects. In this paper, we propose SCALOR, a probabilistic generative model for SCALable sequential Object-oriented Representation. With the proposed spatially-parallel attention and proposal-rejection mechanisms, SCALOR can deal with orders of magnitude more number of objects compared to the previous state-of-the-art model. Besides, we introduce the background model so that SCALOR can model complex background along with many foreground objects. We demonstrate that SCALOR can deal with crowded scenes containing nearly a hundred objects while modeling complex background jointly. Importantly, SCALOR is the first unsupervised object-representation model demonstrating its working in natural scenes containing several tens of moving objects.1