Abstract
The Nystrom method has long been popular for scaling up kernel methods. Its theoretical guarantees and empirical performance rely critically on the quality of the landmarks selected. We study landmark selection for Nystrom using Determinantal Point Processes (D PPs), discrete probability models that allow tractable generation of diverse samples. We prove that landmarks selected via D PPs guarantee bounds on approximation errors; subsequently, we analyze implications for kernel ridge regression. Contrary to prior reservations due to cubic complexity of D PP sampling, we show that (under certain conditions) Markov chain D PP sampling requires only linear time in the size of the data. We present several empirical results that support our theoretical analysis, and demonstrate the superior performance of D PP-based landmark selection compared with existing approaches.