资源论文Distributed k-Clustering for Data with Heavy Noise

Distributed k-Clustering for Data with Heavy Noise

2020-02-14 | |  47 |   44 |   0

Abstract 

In this paper, we consider the k-center/median/means clustering with outliers problems (or the (k, z)-center/median/means problems) in the distributed setting. Most previous distributed algorithms have their communication costs linearly depending on z, the number of outliers. Recently Guha et al. [10] overcame this dependence issue by considering bi-criteria approximation algorithms that output solutions with 2z outliers. For the case where z is large, the extra z outliers discarded by the algorithms might be too large, considering that the data gathering process might be costly. In this paper, we improve the number of outliers to the best possible (1 + image.png)z, while maintaining the O(1)-approximation ratio and independence of communication cost on z. The problems we consider include the (k, z)-center problem, and (k, z)-median/means problems in Euclidean metrics. Implementation of the our algorithm for (k, z)-center shows that it outperforms many previous algorithms, both in terms of the communication cost and quality of the output solution.

上一篇:Semi-crowdsourced Clustering with Deep Generative Models

下一篇:Latent Gaussian Activity Propagation: Using Smoothness and Structure to Separate and Localize Soundsin Large Noisy Environments

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...