资源论文K-means clustering using random matrix sparsification

K-means clustering using random matrix sparsification

2020-03-16 | |  73 |   48 |   0

Abstract

K-means clustering algorithm using Lloyd’s heuristic is one of the most commonly used tools in data mining and machine learning that shows promising performance. However, it suffers from a high computational cost resulting from pairwise Euclidean distance computations between data points and cluster centers in each iteration of Lloyd’s heuristic. Main contributing factor of this computational bottle neck is a matrix-vector multiplication step, where the matrix contains all the data points and the vector is a cluster center In this paper we show that we can randomly sparsify the original data matrix resulting in a spars data matrix which can significantly speed up the above mentioned matrix vector multiplication step without significantly affecting cluster quality. I particular, we show that optimal k-means clustering solution of the sparse data matrix, obtained by applying random matrix sparsification, results in an approximately optimal k-means clustering objective of the original data matrix. Our empirical studies on three real world datasets corroborate our theoretical findings and demonstrate that our proposed sparsification method can indeed achieve satisfactory clustering performance.

上一篇:Neural Dynamic Programming for Musical Self Similarity

下一篇:Blind Justice: Fairness with Encrypted Sensitive Attributes

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...