Abstract
We show that the herding procedure of Welling (2009b) takes exactly the form of a standard convex optimization algorithm— namely a conditional gradient algorithm minimizing a quadratic moment discrepancy. This link enables us to invoke convergence results from convex optimization and to consider faster alternatives for the task of approximating integrals in a reproducing kernel Hilbert space. We study the behavior of the different variants through numerical simulations. Our experiments shed more light on the learning bias of herding: they indicate that while we can improve over herding on the task of approximating integrals, the original herding algorithm approaches more often the maximum entropy distribution.