Abstract Appearance and motion are two key components to depict and characterize the video content. Currently, the two-stream models have achieved stateof-the-art performances on video classifification. However, extracting motion information, specififi- cally in the form of optical flflow features, is extremely computationally expensive, especially for large-scale video classifification. In this paper, we propose a motion hallucination network, namely MoNet, to imagine the optical flflow features from the appearance features, with no reliance on the optical flflow computation. Specififically, MoNet models the temporal relationships of the appearance features and exploits the contextual relationships of the optical flflow features with concurrent connections. Extensive experimental results demonstrate that the proposed MoNet can effectively and effificiently hallucinate the optical flflow features, which together with the appearance features consistently improve the video classifification performances. Moreover, MoNet can help cutting down almost a half of computational and datastorage burdens for the two-stream video classififi- cation. Our code is available at: https://github.com/ YongyiTang92/MoNet-Features