Abstract This paper addresses weakly supervised object detection with only image-level supervision at training stage. Previous approaches train detection models with entire images all at once, making the models prone to being trapped in sub-optimums due to the introduced false positive examples. Unlike them, we propose a zigzag learning strategy to simultaneously discover reliable object instances and prevent the model from overfifitting initial seeds. Towards this goal, we fifirst develop a criterion named mean Energy Accumulation Scores (mEAS) to automatically measure and rank localization diffificulty of an image containing the target object, and accordingly learn the detector progressively by feeding examples with increasing diffificulty. In this way, the model can be well prepared by training on easy examples for learning from more diffificult ones and thus gain a stronger detection ability more effificiently. Furthermore, we introduce a novel masking regularization strategy over the high level convolutional feature maps to avoid overfifitting initial samples. These two modules formulate a zigzag learning process, where progressive learning endeavors to discover reliable object instances, and masking regularization increases the diffificulty of fifinding object instances properly. We achieve 47.6% mAP on PASCAL VOC 2007, surpassing the state-of-the-arts by a large margin