Abstract. Self-supervised learning of convolutional neural networks can
harness large amounts of cheap unlabeled data to train powerful feature
representations. As surrogate task, we jointly address ordering of visual
data in the spatial and temporal domain. The permutations of training
samples, which are at the core of self-supervision by ordering, have so
far been sampled randomly from a fixed preselected set. Based on deep
reinforcement learning we propose a sampling policy that adapts to the
state of the network, which is being trained. Therefore, new permutations are sampled according to their expected utility for updating the
convolutional feature representation. Experimental evaluation on unsupervised and transfer learning tasks demonstrates competitive performance on standard benchmarks for image and video classification and
nearest neighbor retrieval