Abstract
In this paper, we analyze the spatial information of deep
features, and propose two complementary regressions for
robust visual tracking. First, we propose a kernelized ridge
regression model wherein the kernel value is defined as the
weighted sum of similarity scores of all pairs of patches between two samples. We show that this model can be formulated as a neural network and thus can be efficiently solved.
Second, we propose a fully convolutional neural network
with spatially regularized kernels, through which the filter
kernel corresponding to each output channel is forced to focus on a specific region of the target. Distance transform
pooling is further exploited to determine the effectiveness
of each output channel of the convolution layer. The outputs from the kernelized ridge regression model and the fully
convolutional neural network are combined to obtain the
ultimate response. Experimental results on two benchmark
datasets validate the effectiveness of the proposed method