Temporal Pyramid Pooling Convolutional Neural Network for Cover Song
Identification
Abstract
Cover song identification is an important problem
in the field of Music Information Retrieval. Most
existing methods rely on hand-crafted features and
sequence alignment methods, and further breakthrough is hard to achieve. In this paper, Convolutional Neural Networks (CNNs) are used for representation learning toward this task. We show that
they could be naturally adapted to deal with key
transposition in cover songs. Additionally, Temporal Pyramid Pooling is utilized to extract information on different scales and transform songs with
different lengths into fixed-dimensional representations. Furthermore, a training scheme is designed
to enhance the robustness of our model. Extensive
experiments demonstrate that combined with these
techniques, our approach is robust against musical variations existing in cover songs and outperforms state-of-the-art methods on several datasets
with low time complexity