Abstract
Tracking by siamese networks has achieved favorable
performance in recent years. However, most of existing
siamese methods do not take full advantage of spatialtemporal target appearance modeling under different contextual situations. In fact, the spatial-temporal information can provide diverse features to enhance the target representation, and the context information is important for
online adaption of target localization. To comprehensively leverage the spatial-temporal structure of historical target exemplars and get benefit from the context information, in this work, we present a novel Graph Convolutional
Tracking (GCT) method for high-performance visual tracking. Specifically, the GCT jointly incorporates two types
of Graph Convolutional Networks (GCNs) into a siamese
framework for target appearance modeling. Here, we adopt
a spatial-temporal GCN to model the structured representation of historical target exemplars. Furthermore, a context
GCN is designed to utilize the context of the current frame
to learn adaptive features for target localization. Extensive
results on 4 challenging benchmarks show that our GCT
method performs favorably against state-of-the-art trackers
while running around 50 frames per second.