SphereNet: Learning Spherical Representations for
Detection and Classification in Omnidirectional Images
Abstract. Omnidirectional cameras offer great benefits over classical cameras
wherever a wide field of view is essential, such as in virtual reality applications
or in autonomous robots. Unfortunately, standard convolutional neural networks
are not well suited for this scenario as the natural projection surface is a sphere
which cannot be unwrapped to a plane without introducing significant distortions,
particularly in the polar regions. In this work, we present SphereNet, a novel
deep learning framework which encodes invariance against such distortions explicitly into convolutional neural networks. Towards this goal, SphereNet adapts
the sampling locations of the convolutional filters, effectively reversing distortions, and wraps the filters around the sphere. By building on regular convolutions, SphereNet enables the transfer of existing perspective convolutional neural
network models to the omnidirectional case. We demonstrate the effectiveness of
our method on the tasks of image classification and object detection, exploiting
two newly created semi-synthetic and real-world omnidirectional datasets.