Abstract
We use weakly supervised structured learning to track and disambiguate the identity of multiple indistinguishable, translucent and deformable objects that can overlap for many frames. For this challenging problem, we propose a novel model which handles occlusions, complex motions and non-rigid deformations by jointly optimizing the flflows of multiple latent intensities across frames. These flflows are latent variables for which the user cannot directly provide labels. Instead, we leverage a structured learning formulation that uses weak user annotations to fifind the best hyperparameters of this model. The approach is evaluated on a challenging dataset for the tracking of multiple Drosophila larvae which we make publicly available. Our method tracks multiple larvae in spite of their poor distinguishability and minimizes the number of identity switches during prolonged mutual occlusion