Abstract. Adversarial perturbations are noise-like patterns that can
subtly change the data, while failing an otherwise accurate classifier. In
this paper, we propose to use such perturbations for improving the robustness of video representations. To this end, given a well-trained deep
model for per-frame video recognition, we first generate adversarial noise
adapted to this model. Using the original data features from the full video
sequence and their perturbed counterparts, as two separate bags, we develop a binary classification problem that learns a set of discriminative
hyperplanes – as a subspace – that will separate the two bags from each
other. This subspace is then used as a descriptor for the video, dubbed
discriminative subspace pooling. As the perturbed features belong to data
classes that are likely to be confused with the original features, the discriminative subspace will characterize parts of the feature space that are
more representative of the original data, and thus may provide robust
video representations. To learn such descriptors, we formulate a subspace
learning objective on the Stiefel manifold and resort to Riemannian optimization methods for solving it efficiently. We provide experiments on
several video datasets and demonstrate state-of-the-art results