Abstract. We propose a novel discrete Fourier transform-based pooling
layer for convolutional neural networks. The DFT magnitude pooling
replaces the traditional max/average pooling layer between the convolution and fully-connected layers to retain translation invariance and
shape preserving (aware of shape difference) properties based on the
shift theorem of the Fourier transform. Thanks to the ability to handle
image misalignment while keeping important structural information in
the pooling stage, the DFT magnitude pooling improves the classification accuracy significantly. In addition, we propose the DFT+ method
for ensemble networks using the middle convolution layer outputs. The
proposed methods are extensively evaluated on various classification tasks
using the ImageNet, CUB 2010-2011, MIT Indoors, Caltech 101, FMD
and DTD datasets. The AlexNet, VGG-VD 16, Inception-v3, and ResNet
are used as the base networks, upon which DFT and DFT+ methods
are implemented. Experimental results show that the proposed methods
improve the classification performance in all networks and datasets