资源论文Query K-means Clustering and the Double Dixie Cup Problem

Query K-means Clustering and the Double Dixie Cup Problem

2020-02-17 | |  43 |   42 |   0

Abstract 

We consider the problem of approximate K-means clustering with outliers and side information provided by same-cluster queries and possibly noisy answers. Our solution shows that, under some mild assumptions on the smallest cluster size, one can obtain an (1 + image.png)-approximation for the optimal potential with probability 3 at least 1 -image.png, where image.png > 0 and image.png image.png (0, 1), using an expected number of image.png noiseless same-cluster queries and comparison-based clustering of complexity image.png; here, n denotes the number of points and d the dimension of space. Compared to a handful of other known approaches that perform importance sampling to account for small cluster sizes, the proposed query technique reduces 6 the number of queries by a factor of roughly image.png, at the cost of possibly missing very small clusters. We extend this settings to the case where some queries to the oracle produce erroneous information, and where certain points, termed outliers, do not belong to any clusters. Our proof techniques differ from previous methods used for K-means clustering analysis, as they rely on estimating the sizes of the clusters and the number of points needed for accurate centroid estimation and subsequent nontrivial generalizations of the double Dixie cup problem. We illustrate the performance of the proposed algorithm both on synthetic and real datasets, including MNIST and CIFAR 10.

上一篇:Occam’s razor is insufficient to infer the preferences of irrational agents

下一篇:Efficient Convex Completion of Coupled Tensors using Coupled Nuclear Norms

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...