Abstract
Surveillance videos are able to capture a variety of realistic anomalies. In this paper, we propose to learn anomalies by exploiting both normal and anomalous videos. To
avoid annotating the anomalous segments or clips in training videos, which is very time consuming, we propose to
learn anomaly through the deep multiple instance ranking
framework by leveraging weakly labeled training videos,
i.e. the training labels (anomalous or normal) are at videolevel instead of clip-level. In our approach, we consider
normal and anomalous videos as bags and video segments
as instances in multiple instance learning (MIL), and automatically learn a deep anomaly ranking model that predicts
high anomaly scores for anomalous video segments. Furthermore, we introduce sparsity and temporal smoothness
constraints in the ranking loss function to better localize
anomaly during training.
We also introduce a new large-scale first of its kind
dataset of 128 hours of videos. It consists of 1900 long and
untrimmed real-world surveillance videos, with 13 realistic
anomalies such as fighting, road accident, burglary, robbery, etc. as well as normal activities. This dataset can be
used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in
another group. Second, for recognizing each of 13 anomalous activities. Our experimental results show that our MIL
method for anomaly detection achieves significant improvement on anomaly detection performance as compared to
the state-of-the-art approaches. We provide the results of
several recent deep learning baselines on anomalous activity recognition. The low recognition performance of these
baselines reveals that our dataset is very challenging and
opens more opportunities for future work. The dataset is
available at: http://crcv.ucf.edu/projects/real-world/