Distributed and Provably Good Seedings for k-Means in Constant Rounds

资源分类

2020-03-10 |

62 |

48 |

Abstract

The k-means++ algorithm is the state of the art algorithm to solve k-Means clustering problems as the computed clusterings are O(log k) competitive in expectation. However, its seeding step requires k inherently sequential passes through the full data set making it hard to scale to massive data sets. The standard remedy is to use the k-means algorithm which reduces the number of sequential rounds and is thus suitable for a di tributed setting. In this paper, we provide a nove analysis of the k-meansk algorithm that bounds the expected solution quality for any number of rounds and oversampling factors greater than k, the two parameters one needs to choose in practice. In particular, we show that k-meansk provides provably good clusterings even for a small, constant number of iterations. This theoretical finding explains the common observation that k-meansk performs extremely well in practice even if the number of rounds is low. We further provide a hard instance that shows that an additive error term as encountered in our analysis is inevitable if less than k - 1 rounds are employed.

上一篇：On the Iteration Complexity of Support Recovery via Hard Thresholding Pursuit

下一篇：Compressed Sensing using Generative Models

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com