Abstract
Photo aesthetics assessment is challenging. Deep con-volutional neural network (ConvNet) methods have recentlyshown promising results for aesthetics assessment. The per-formance of these deep ConvNet methods, however, is oftencompromised by the constraint that the neural network onlytakes the fixed-size input. To accommodate this require-ment, input images need to be transformed via cropping, scaling, or padding, which often damages image composi-tion, reduces image resolution, or causes image distortion,thus compromising the aesthetics of the original images. Inthis paper, we present a composition-preserving deep Con-vNet method that directly learns aesthetics features from theoriginal input images without any image transformations.Specifically, our method adds an adaptive spatial poolinglayer upon the regular convolution and pooling layers todirectly handle input images with original sizes and aspect ratios. To allow for multi-scale feature extraction, we de-velop the Multi-Net Adaptive Spatial Pooling ConvNet ar-chitecture which consists of multiple sub-networks with dif-ferent adaptive spatial pooling sizes and leverage a scene-based aggregation layer to effectively combine the predic-tions from multiple sub-networks. Our experiments on thelarge-scale aesthetics assessment benchmark (AVA [29])demonstrate that our method can significantly improve thestate-of-the-art results in photo aesthetics assessment.