Abstract
This paper introduces a new framework for image classifi- cation using local visual descriptors. The pipeline first performs a non- linear feature transformation on descriptors, then aggregates the results together to form image-level representations, and finally applies a clas- sification model. For all the three steps we suggest novel solutions which make our approach appealing in theory, more scalable in computation, and transparent in classification. Our experiments demonstrate that the proposed classification method achieves state-of-the-art accuracy on the well-known PASCAL benchmarks.