Abstract
A family of super deep networks, referred to as residual
networks or ResNet [14], achieved record-beating performance in various visual tasks such as image recognition,
object detection, and semantic segmentation. The ability to
train very deep networks naturally pushed the researchers
to use enormous resources to achieve the best performance.
Consequently, in many applications super deep residual
networks were employed for just a marginal improvement
in performance. In this paper, we propose ?-ResNet that allows us to automatically discard redundant layers, which
produces responses that are smaller than a threshold ?,
without any loss in performance. The ?-ResNet architecture can be achieved using a few additional rectified linear
units in the original ResNet. Our method does not use any
additional variables nor numerous trials like other hyperparameter optimization techniques. The layer selection is
achieved using a single training process and the evaluation
is performed on CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets. In some instances, we achieve about 80%
reduction in the number of parameters.