资源论文Affinity Clustering: Hierarchical Clustering at Scale

Affinity Clustering: Hierarchical Clustering at Scale

2020-02-10 | |  47 |   47 |   0

Abstract 

Graph clustering is a fundamental task in many data-mining and machine-learning pipelines. In particular, identifying a good hierarchical structure is at the same time a fundamental and challenging problem for several applications. The amount of data to analyze is increasing at an astonishing rate each day. Hence there is a need for new solutions to efficiently compute effective hierarchical clusterings on such huge data. The main focus of this paper is on minimum spanning tree (MST) based clusterings. In particular, we propose affinity, a novel hierarchical clustering based on image.png MST algorithm. We prove certain theoretical guarantees for affinity (as well as some other classic algorithms) and show that in practice it is superior to several other state-of-the-art clustering algorithms. Furthermore, we present two MapReduce implementations for affinity. The first one works for the case where the input graph is dense and takes constant rounds. It is based on a Massively Parallel MST algorithm for dense graphs that improves upon the state-of-the-art algorithm of Lattanzi et al. [34]. Our second algorithm has no assumption on the density of the input graph and finds the affinity clustering in O(log n) rounds using Distributed Hash Tables (DHTs). We show experimentally that our algorithms are scalable for huge data sets, e.g., for graphs with trillions of edges.

上一篇:A framework for Multi-A(rmed)/B(andit) Testing with Online FDR Control

下一篇:Inverse Filtering for Hidden Markov Models

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...