Abstract
Large collections of videos are grouped into clusters
by a topic keyword, such as “Eiffel Tower” or “Surfing”,
with many important visual concepts repeating across them.
Such a topically close set of videos have mutual influence on
each other, which could be used to summarize one of them
by exploiting information from others in the set. We build on
this intuition to develop a novel approach to extract a summary that simultaneously captures both important particularities arising in the given video, as well as, generalities
identified from the set of videos. The topic-related videos
provide visual context to identify the important parts of the
video being summarized. We achieve this by developing a
collaborative sparse optimization method which can be ef-
ficiently solved by a half-quadratic minimization algorithm.
Our work builds upon the idea of collaborative techniques
from information retrieval and natural language processing, which typically use the attributes of other similar objects to predict the attribute of a given object. Experiments
on two challenging and diverse datasets well demonstrate
the efficacy of our approach over state-of-the-art methods