Abstract. Regression trackers directly learn a mapping from regularly
dense samples of target objects to soft labels, which are usually generated by a Gaussian function, to estimate target positions. Due to the
potential for fast-tracking and easy implementation, regression trackers have recently received increasing attention. However, state-of-the-art
deep regression trackers do not perform as well as discriminative correlation filters (DCFs) trackers. We identify the main bottleneck of training
regression networks as extreme foreground-background data imbalance.
To balance training data, we propose a novel shrinkage loss to penalize
the importance of easy training data. Additionally, we apply residual
connections to fuse multiple convolutional layers as well as their output
response maps. Without bells and whistles, the proposed deep regression
tracking method performs favorably against state-of-the-art trackers, especially in comparison with DCFs trackers, on five benchmark datasets
including OTB-2013, OTB-2015, Temple-128, UAV-123 and VOT-2016