Abstract
Learning based temporal action localization methods require vast amounts of training data. However, such largescale video datasets, which are expected to capture the dynamics of every action category, are not only very expensive to acquire but are also not practical simply because
there exists an uncountable number of action classes. This
poses a critical restriction to the current methods when the
training samples are few and rare (e.g. when the target action classes are not present in the current publicly available datasets). To address this challenge, we conceptualize
a new example-based action detection problem where only
a few examples are provided, and the goal is to find the
occurrences of these examples in an untrimmed video sequence. Towards this objective, we introduce a novel oneshot action localization method that alleviates the need for
large amounts of training samples. Our solution adopts the
one-shot learning technique of Matching Network and utilizes correlations to mine and localize actions of previously
unseen classes. We evaluate our one-shot action localization method on the THUMOS14 and ActivityNet datasets,
of which we modified the configuration to fit our one-shot
problem setup.