Distributed Clustering via LSH Based Data Partitioning

资源分类

2020-03-20 |

67 |

42 |

Abstract

Given the importance of clustering in the analysis of large scale data, distributed algorithms for for mulations such as k-means, k-median, etc. have been extensively studied. A successful approach here has been the “reduce and merge” paradigm, in which each machine reduces its input size to 图片.png (k), e and this data reduction continues (possibly iteratively) until all the data fits on one machine at which point the problem is solved locally. This approach has the intrinsic bottleneck that each machine must solve a problem of size k, and needs to communicate at least (k) points to the other machines. We propose a novel data partitioning idea to overcome this bottleneck, and in effect, have different machines focus on “finding different clusters”. Under the assumption that we know the optimum value of the objective up to a poly(n) factor (arbitrary polynomial), we establish worst-case approximation guarantees for our method. We see that our algorithm results in lower communication as well as a near-optimal number of ‘rounds’ of computation (in the popular MapReduce framework).

上一篇：Decoupled Parallel Backpropagation with Convergence Guarantee

下一篇：Learning Hidden Markov Models from Pairwise Co-occurrences with Application to Topic Modeling

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com