Abstract
In this paper, we revisit the problem of constructing a near-optimal rank k approximation of a matrix under the streaming data model where the columns of M are revealed sequentially. We present SLA (Streaming Low-rank Approximation), an algorithm that is asymptotically accurate, when -th largest singular value of M . This means that its average mean-square error converges to 0 as m and n grow large with high probability, where M? ( denote the output of SLA and the optimal rank k approximation of M , respectively). Our algorithm makes one pass on the data if the columns of M are revealed in a random order, and two passes if the columns of M arrive in an arbitrary order. To reduce its memory footprint and complexity, SLA uses random sparsification, and samples each entry of M with a small probability . In turn, SLA is memory optimal as its required memory space scales as k(m+n), the dimension of its output. Furthermore, SLA is computationally efficient as it runs in time (a constant number of operations is made for each observed entry of M ), which can be as small as for an appropriate choice of