Abstract
Observing that Semantic features learned in an image
classification task and Appearance features learned in a
similarity matching task complement each other, we build
a twofold Siamese network, named SA-Siam, for real-time
object tracking. SA-Siam is composed of a semantic branch
and an appearance branch. Each branch is a similaritylearning Siamese network. An important design choice in
SA-Siam is to separately train the two branches to keep
the heterogeneity of the two types of features. In addition, we propose a channel attention mechanism for the
semantic branch. Channel-wise weights are computed according to the channel activations around the target position. While the inherited architecture from SiamFC [3] allows our tracker to operate beyond real-time, the twofold
design and the attention mechanism significantly improve
the tracking performance. The proposed SA-Siam outperforms all other real-time trackers by a large margin on
OTB-2013/50/100 benchmarks