Abstract
Object detection is a fundamental problem in image un-derstanding. One popular solution is the R-CNN frame-work [15] and its fast versions [14, 27]. They decomposethe object detection problem into two cascaded easier tasks:1) generating object proposals from images, 2) classifyingproposals into various object categories. Despite that weare handling with two relatively easier tasks, they are notsolved perfectly and there’s still room for improvement. In this paper, we push the “divide and conquer” solu-tion even further by dividing each task into two sub-tasks.We call the proposed method “CRAFT” (Cascade Region-proposal-network And FasT-rcnn), which tackles each task with a carefully designed network cascade. We show that the cascade structure helps in both tasks: in proposal gen-eration, it provides more compact and better localized ob-ject proposals; in object classification, it reduces false pos-itives (mainly between ambiguous categories) by capturing both interand intra-category variances. CRAFT achieves consistent and considerable improvement over the state-ofthe-art on object detection benchmarks like PASCAL VOC 07/12 and ILSVRC.