Abstract
Appearance model is one of the most important components for online visual tracking. An effective appearance model needs to strike the right balance between being adaptive, to account for appearance change, and being conservative, to re-track the ob ject after it loses track- ing (e.g., due to occlusion). Most conventional appearance models fo- cus on one aspect out of the two, and hence are not able to achieve the right balance. In this paper, we approach this problem by a max- margin learning framework collaborating a descriptive component and a discriminative component. Particularly, the two components are for dif- ferent purposes and with different lifespans. One forms a robust ob ject model, and the other tries to distinguish the ob ject from the current background. Taking advantages of their complementary roles, the com- ponents improve each other and collaboratively contribute to a shared score function. Besides, for realtime implementation, we also propose a series of optimization and sample-management strategies. Experiments over 30 challenging videos demonstrate the effectiveness and robustness of the proposed tracker. Our method generally outperforms the existing state-of-the-art methods.