Abstract
Given two action sequences, we are interested in
spotting/co-segmenting all pairs of sub-sequences that represent the same action. We propose a totally unsupervised
solution to this problem. No a-priori model of the actions is assumed to be available. The number of common
sub-sequences may be unknown. The sub-sequences can
be located anywhere in the original sequences, may differ in duration and the corresponding actions may be performed by a different person, in different style. We treat
this type of temporal action co-segmentation as a stochastic optimization problem that is solved by employing Particle Swarm Optimization (PSO). The objective function that
is minimized by PSO capitalizes on Dynamic Time Warping (DTW) to compare two action sub-sequences. Due to
the generic problem formulation and solution, the proposed
method can be applied to motion capture (i.e., 3D skeletal)
data or to conventional RGB videos acquired in the wild.
We present extensive quantitative experiments on standard
data sets as well as on data sets we introduced in this paper.
The obtained results demonstrate that the proposed method
achieves a remarkable increase in co-segmentation quality
compared to all tested state of the art methods.