Abstract. Local structures of target objects are essential for robust
tracking. However, existing methods based on deep neural networks mostly
describe the target appearance from the global view, leading to high
sensitivity to non-rigid appearance change and partial occlusion. In this
paper, we circumvent this issue by proposing a local structure learning
method, which simultaneously considers the local patterns of the target and their structural relationships for more accurate target tracking.
To this end, a local pattern detection module is designed to automatically identify discriminative regions of the target objects. The detection
results are further refined by a message passing module, which enforces
the structural context among local patterns to construct local structures.
We show that the message passing module can be formulated as the inference process of a conditional random field (CRF) and implemented
by differentiable operations, allowing the entire model to be trained in
an end-to-end manner. By considering various combinations of the local
structures, our tracker is able to form various types of structure patterns. Target tracking is finally achieved by a matching procedure of the
structure patterns between target template and candidates. Extensive
evaluations on three benchmark data sets demonstrate that the proposed
tracking algorithm performs favorably against state-of-the-art methods
while running at a highly efficient speed of 45 fps