Abstract
Deep Convolutional Neural Networks (DCNNs) achieveinvariance to domain transformations (deformations) byusing multiple ‘max-pooling’ (MP) layers. In this workwe show that alternative methods of modeling deforma-tions can improve the accuracy and efficiency of DCNNs. First, we introduce epitomic convolution as an alternative to the common convolution-MP cascade of DCNNs, that comes with the same computational cost but favorable learning properties. Second, we introduce a Multiple Instance Learning algorithm to accommodate global translation and scaling in image classification, yielding an efficientalgorithm that trains and tests a DCNN in a consistent manner. Third we develop a DCNN sliding window detector thatexplicitly, but efficiently, searches over the object’s positioscale, and aspect ratio. We provide competitive image classification and localization results on the ImageNet dataset and object detection results on Pascal VOC2007.