Abstract
Deep networks have shown impressive performance on
many computer vision tasks. Recently, deep convolutional
neural networks (CNNs) have been used to learn discriminative texture representations. One of the most successful
approaches is Bilinear CNN model that explicitly captures
the second order statistics within deep features. However,
these networks cut off the first order information flow in
the deep network and make gradient back-propagation dif-
ficult. We propose an effective fusion architecture - FASON
that combines second order information flow and first order information flow. Our method allows gradients to backpropagate through both flows freely and can be trained effectively. We then build a multi-level deep architecture to
exploit the first and second order information within different convolutional layers. Experiments show that our
method achieves improvements over state-of-the-art methods on several benchmark datasets