Abstract
This paper considers the problem of canonicalcorrelation analysis, and more broadly, the generalized eigenvector problem for a pair of symmetric matrices. We consider the setting of finding top-k canonical/eigen subspace, and solve these problems through a general framework that simply requires black box access to an approximate linear system solver. Instantiating this framework with accelerated gradient descent we obtain a running time of where z is the total number of nonzero entries, is the condition number and ρ is the relative eigenvalue gap of the appropriate matrices. Our algorithm is linear in the input size and the number of components k up to a log(k) factor, which is essential for handling large-scale matrices that appear in practice. To the best of our knowledge this is the first such algorithm with global linear convergence. We hope that our results prompt further research improving the practical running time for performing these important data analysis procedures on large-scale data sets.