Abstract
Feature extraction, coding and pooling, are important com- ponents on many contemporary ob ject recognition paradigms. In this paper we explore novel pooling techniques that encode the second-order statistics of local descriptors inside a region. To achieve this effect, we introduce multiplicative second-order analogues of average and max- pooling that together with appropriate non-linearities lead to state-of- the-art performance on free-form region recognition, without any type of feature coding. Instead of coding, we found that enriching local descrip- tors with additional image information leads to large performance gains, especially in conjunction with the proposed pooling methodology. We show that second-order pooling over free-form regions produces results superior to those of the winning systems in the Pascal VOC 2011 seman- tic segmentation challenge, with models that are 20,000 times faster.