Abstract
Recent works have shown that, even with simple low level visual cues, complex behaviors can be extracted automatically from crowded scenes, e.g. those depicting public spaces recorded from video surveillance cameras. However, low level features as optical flow or fore- ground pixels are inherently noisy. In this paper we propose a novel unsupervised learning approach for the analysis of complex scenes which is specifically tailored to cope directly with features’ noise and uncer- tainty. We formalize the task of extracting activity patterns as a matrix factorization problem, considering as reconstruction function the robust Earth Mover’s Distance. A constraint of sparsity on the computed basis matrix is imposed, filtering out noise and leading to the identification of the most relevant elementary activities in a typical high level behavior. We further derive an alternate optimization approach to solve the pro- posed problem efficiently and we show that it is reduced to a sequence of linear programs. Finally, we propose to use short tra jectory snippets to account for ob ject motion information, in alternative to the noisy optical flow vectors used in previous works. Experimental results demonstrate that our method yields similar or superior performance to state-of-the arts approaches.