Abstract. Object tracking is still a critical and challenging problem
with many applications in computer vision. For this challenge, more and
more researchers pay attention to applying deep learning to get powerful feature for better tracking accuracy. In this paper, a novel triplet
loss is proposed to extract expressive deep feature for object tracking
by adding it into Siamese network framework instead of pairwise loss for
training. Without adding any inputs, our approach is able to utilize more
elements for training to achieve more powerful feature via the combination of original samples. Furthermore, we propose a theoretical analysis
by combining comparison of gradients and back-propagation, to prove
the effectiveness of our method. In experiments, we apply the proposed
triplet loss for three real-time trackers based on Siamese network. And
the results on several popular tracking benchmarks show our variants operate at almost the same frame-rate with baseline trackers and achieve
superior tracking performance than them, as well as the comparable accuracy with recent state-of-the-art real-time trackers