Abstract
In this paper, we address the column-based low-rank matrix approximation problem using a novel parallel approach. Our approach is based on the divide-andcombine idea. We first perform column selection on submatrices of an original data matrix in parallel, and the combine the selected columns into the final output. Our approach enjoys a theoretical relative-error upper bound In addition, our column-based low-rank approximation partitions data in a deterministic way and makes no assumptions about matrix coherence. Compared with other traditional methods, our approach is scalable on largescale matrices. Finally, experiments on both simulated and real world data show that our approach is both efficient and effective.