Abstract Deep CNNs have achieved superior performance in many tasks of computer vision and image understanding. However, it is still diffificult to effectively apply deep CNNs to video object segmentation(VOS) since treating video frames as separate and static will lose the information hidden in motion. To tackle this problem, we propose a Motionguided Cascaded Refifinement Network for VOS. By assuming the object motion is normally different from the background motion, for a video frame we fifirst apply an active contour model on optical flflow to coarsely segment objects of interest. Then, the proposed Cascaded Refifinement Network(CRN) takes the coarse segmentation as guidance to generate an accurate segmentation of full resolution. In this way, the motion information and the deep CNNs can well complement each other to accurately segment objects from video frames. Furthermore, in CRN we introduce a Single-channel Residual Attention Module to incorporate the coarse segmentation map as attention, making our network effective and effificient in both training and testing. We perform experiments on the popular benchmarks and the results show that our method achieves state-of-the-art performance at a much faster speed