Abstract
This paper introduces a novel variant of video summarization, namely building a summary that depends on the
particular aspect of a video the viewer focuses on. We refer to this as viewpoint. To infer what the desired viewpoint
may be, we assume that several other videos are available,
especially groups of videos, e.g., as folders on a person’s
phone or laptop. The semantic similarity between videos
in a group vs. the dissimilarity between groups is used
to produce viewpoint-specific summaries. For considering
similarity as well as avoiding redundancy, output summary
should be (A) diverse, (B) representative of videos in the
same group, and (C) discriminative against videos in the
different groups. To satisfy these requirements (A)-(C) simultaneously, we proposed a novel video summarization
method from multiple groups of videos. Inspired by Fisher’s
discriminant criteria, it selects summary by optimizing the
combination of three terms (a) inner-summary, (b) innergroup, and (c) between-group variances defined on the feature representation of summary, which can simply represent
(A)-(C). Moreover, we developed a novel dataset to investigate how well the generated summary reflects the underlying viewpoint. Quantitative and qualitative experiments
conducted on the dataset demonstrate the effectiveness of
proposed method.