Abstract. In recent deep online and near-online multi-object tracking
approaches, a difficulty has been to incorporate long-term appearance
models to efficiently score object tracks under severe occlusion and multiple missing detections. In this paper, we propose a novel recurrent
network model, the Bilinear LSTM, in order to improve the learning of
long-term appearance models via a recurrent network. Based on intuitions drawn from recursive least squares, Bilinear LSTM stores building
blocks of a linear predictor in its memory, which is then coupled with
the input in a multiplicative manner, instead of the additive coupling
in conventional LSTM approaches. Such coupling resembles an online
learned classifier/regressor at each time step, which we have found to
improve performances in using LSTM for appearance modeling. We also
propose novel data augmentation approaches to efficiently train recurrent models that score object tracks on both appearance and motion. We
train an LSTM that can score object tracks based on both appearance
and motion and utilize it in a multiple hypothesis tracking framework.
In experiments, we show that with our novel LSTM model, we achieved
state-of-the-art performance on near-online multiple object tracking on
the MOT 2016 and MOT 2017 benchmarks.