资源论文single pass pca of large high dimensional data

single pass pca of large high dimensional data

2019-11-01 | |  48 |   46 |   0
Abstract Principal component analysis (PCA) is a fundamental dimension reduction tool in statistics and machine learning. For large and high-dimensional data, computing the PCA (i.e., the top singular vectors of the data matrix) becomes a challenging task. In this work, a single-pass randomized algorithm is proposed to compute PCA with only one pass over the data. It is suitable for processing extremely large and high-dimensional data stored in slow memory (hard disk) or the data generated in a streaming fashion. Experiments with synthetic and real data validate the algorithm’s accuracy, which has orders of magnitude smaller error than an existing single-pass algorithm. For a set of highdimensional data stored as a 150 GB file, the algorithm is able to compute the first 50 principal components in just 24 minutes on a typical 24-core computer, with less than 1 GB memory cost.

上一篇:improving stochastic block models by incorporating power law degree characteristic

下一篇:hybrid neural networks for learning the trend in time series

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...