Abstract
We introduce the concept of dynamic image, a novelcompact representation of videos useful for video analy-sis especially when convolutional neural networks (CNNs)are used. The dynamic image is based on the rank pool-ing concept and is obtained through the parameters of aranking machine that encodes the temporal evolution of theframes of the video. Dynamic images are obtained by di-rectly applying rank pooling on the raw image pixels of avideo producing a single RGB image per video. This ideais simple but powerful as it enables the use of existing CNNmodels directly on video data with fine-tuning. We presentan efficient and effective approximate rank pooling opera-tor, speeding it up orders of magnitude compared to rank pooling. Our new approximate rank pooling CNN layer allows us to generalize dynamic images to dynamic feature maps and we demonstrate the power of our new representations on standard benchmarks in action recognition achiev-ing state-of-the-art performance.