Abstract. We present a fast and accurate visual tracking algorithm
based on the multi-domain convolutional neural network (MDNet). The
proposed approach accelerates feature extraction procedure and learns
more discriminative models for instance classification; it enhances representation quality of target and background by maintaining a high resolution feature map with a large receptive field per activation. We also introduce a novel loss term to differentiate foreground instances across multiple domains and learn a more discriminative embedding of target objects
with similar semantics. The proposed techniques are integrated into the
pipeline of a well known CNN-based visual tracking algorithm, MDNet.
We accomplish approximately 25 times speed-up with almost identical
accuracy compared to MDNet. Our algorithm is evaluated in multiple
popular tracking benchmark datasets including OTB2015, UAV123, and
TempleColor, and outperforms the state-of-the-art real-time tracking
methods consistently even without dataset-specific parameter tuning