Abstract
Visual tracking is the task of estimating the tra jectory of an ob ject in a video given its initial location. This is usually done by com- bining at each step an appearance and a motion model. In this work, we learn from a small set of training tra jectory annotations how the ob jects in the scene typically move. We learn the relative weight between the appearance and the motion model. We call this weight: visual deceptive- ness. At test time, we transfer the deceptiveness and the displacement from the closest tra jectory annotation to infer the next location of the ob ject. Further, we condition the transference on an event model. On a set of 161 manually annotated test tra jectories, we show in our experi- ments that learning from just 10 tra jectory annotations halves the center location error and improves the success rate by about 10%.