Abstract
Although video summarization has achieved great success in recent years, few approaches have realized the in-
fluence of video structure on the summarization results. As
we know, the video data follow a hierarchical structure, i.e.,
a video is composed of shots, and a shot is composed of
several frames. Generally, shots provide the activity-level
information for people to understand the video content.
While few existing summarization approaches pay attention
to the shot segmentation procedure. They generate shots by
some trivial strategies, such as fixed length segmentation,
which may destroy the underlying hierarchical structure of
video data and further reduce the quality of generated summaries. To address this problem, we propose a structureadaptive video summarization approach that integrates shot
segmentation and video summarization into a Hierarchical
Structure-Adaptive RNN, denoted as HSA-RNN. We evaluate the proposed approach on four popular datasets, i.e.,
SumMe, TVsum, CoSum and VTW. The experimental results have demonstrated the effectiveness of HSA-RNN in the
video summarization task