Abstract. This paper addresses the problem of viewpoint estimation
of an object in a given image. It presents five key insights and a CNN
that is based on them. The network’s major properties are as follows.
(i) The architecture jointly solves detection, classification, and viewpoint
estimation. (ii) New types of data are added and trained on. (iii) A novel
loss function, which takes into account both the geometry of the problem
and the new types of data, is propose. Our network allows a substantial
boost in performance: from 36.1% gained by SOTA algorithms to 45.9%