Abstract
In recent years, several feature encoding schemes for the bags-of-visual-words model have been proposed. While most of these schemes produce impressive results, they all share an important limitation: their high computational complexity makes it challenging to use them for largescale problems. In this work, we propose an approximate locality-constrained encoding scheme that offers signifificantly better computational effificiency (∼ 40×) than its exact counterpart, with comparable classifification accuracy. Using the perturbation analysis of least-squares problems, we present a formal approximation error analysis of our approach, which helps distill the intuition behind the robustness of our method. We present a thorough set of empirical analyses on multiple standard data-sets, to assess the capability of our encoding scheme for its representational as well as discriminative accuracy