Abstract. Second-order pooling, a.k.a. bilinear pooling, has proven effective for deep learning based visual recognition. However, the resulting second-order networks yield a inal representation that is orders of
magnitude larger than that of standard, irst-order ones, making them
memory-intensive and cumbersome to deploy. Here, we introduce a general, parametric compression strategy that can produce more compact
representations than existing compression techniques, yet outperform
both compressed and uncompressed second-order models. Our approach
is motivated by a statistical analysis of the network’s activations, relying
on operations that lead to a Gaussian-distributed inal representation,
as inherently used by irst-order deep networks. As evidenced by our
experiments, this lets us outperform the state-of-the-art irst-order and
second-order models on several benchmark recognition datasets.