Abstract
The statistical leverage scores of a data matrix are the squared row-norms of any matrix whose columns are obtained by orthogonalizing the columns of the data matrix; and, the coherence is the largest leverage score. These quantities play an important role in several machine learning algorithms because they capture the key structural nonuniformity of the data matrix that must be dealt with in developing eficient randomized algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n×d matrix A, with n d, and returns, as output, relative-error approximations to all n of the statistical leverage scores. The proposed algorithm runs in O(nd log n) time, as opposed to the time required by the na¨ıve
algorithm that involves computing an orthogonal basis for the range of A. This resolves an open question from (Drineas et al., 2006) and (Mohri & Talwalkar, 2011); and our result leads to immediate improvements in coreset-based -regression, the estimation of the coherence of a matrix, and several related low-rank matrix problems. Interestingly, to achieve our result we judiciously apply random projections on both sides of A. thAppearing in Proceedings of the 29 International Conference on Machine Learning, Edinburgh, Scotland, UK, 2012. Copyright 2012 by the author(s)/owner(s).