Abstract
In this paper, we present a novel learning-based algorithm for tempo- ral segmentation of a video into clips based on both camera and scene motion, in particular, based on combinations of static vs. dynamic camera and static vs. dy- namic scene. Given a video, we first perform shot boundary detection to segment the video to shots. We enforce temporal continuity by constructing a Markov Ran- dom Field (MRF) over the frames of each video shot with edges between consec- utive frames and cast the segmentation problem as a frame level discrete labeling problem. Using manually labeled data we learn classi fiers exploiting cues from optical flow to provide evidence for the different labels, and infer the best labeling over the frames. We show the effectiveness of the approach using user videos and full-length movies. Using sixty full-length movies spanning 50 years, we show that the proposed algorithm of grouping frames purely based on motion cues can aid computational applications such as recovering depth from a video and also re- veal interesting trends in movies, which finds itself interesting novel applications in video analysis (time-stamping archive movies) and film studies.