Abstract
Music creation involves not only composing the
different parts (e.g., melody, chords) of a musical
work but also arranging/selecting the instruments
to play the different parts. While the former has received increasing attention, the latter has not been
much investigated. This paper presents, to the best
of our knowledge, the first deep learning models
for rearranging music of arbitrary genres. Specifically, we build encoders and decoders that take a
piece of polyphonic musical audio as input, and
predict as output its musical score. We investigate disentanglement techniques such as adversarial training to separate latent factors that are related
to the musical content (pitch) of different parts of
the piece, and that are related to the instrumentation (timbre) of the parts per short-time segment.
By disentangling pitch and timbre, our models have
an idea of how each piece was composed and arranged. Moreover, the models can realize “composition style transfer” by rearranging a musical
piece without much affecting its pitch content. We
validate the effectiveness of the models by experiments on instrument activity detection and composition style transfer. To facilitate follow-up research, we open source our code at https://github.
com/biboamy/instrument-disentangle