Abstract
Simultaneously detecting an ob ject and determining its pose has become a popular research topic in recent years. Due to the large variances of the ob ject appearance in images, it is critical to capture the discriminative ob ject parts that can provide key information about the ob ject pose. Recent part-based models have obtained state-of-the- art results for this task. However, such models either require manually defined ob ject parts with heavy supervision or a complicated algorithm to find discriminative ob ject parts. In this study, we have designed a novel deep architecture, called Auto-masking Neural Network (ANN), for ob ject detection and viewpoint estimation. ANN can automatically learn to select the most discriminative ob ject parts across different viewpoints from training images. We also propose a method of accurate continuous viewpoint estimation based on the output of ANN. Experimental results on related datasets show that ANN outperforms previous methods.