Abstract The FlowNet demonstrated that optical flflow estimation can be cast as a learning problem. However, the state of the art with regard to the quality of the flflow has still been defifined by traditional methods. Particularly on small displacements and real-world data, FlowNet cannot compete with variational methods. In this paper, we advance the concept of end-to-end learning of optical flflow and make it work really well. The large improvements in quality and speed are caused by three major contributions: fifirst, we focus on the training data and show that the schedule of presenting data during training is very important. Second, we develop a stacked architecture that includes warping of the second image with intermediate optical flflow. Third, we elaborate on small displacements by introducing a subnetwork specializing on small motions. FlowNet 2.0 is only marginally slower than the original FlowNet but decreases the estimation error by more than 50%. It performs on par with state-of-the-art methods, while running at interactive frame rates. Moreover, we present faster variants that allow optical flflow computation at up to 140fps with accuracy matching the original FlowNet