资源论文Topkapi: Parallel and Fast Sketches for Finding Top-K Frequent Elements

Topkapi: Parallel and Fast Sketches for Finding Top-K Frequent Elements

2020-02-13 | |  125 |   50 |   0

Abstract

 Identifying the top-K frequent items in a collection or data stream is one of the most common and important operations in large data processing systems. As a result, several solutions have been proposed to solve this problem approximately. We observe that the existing algorithms, although theoretically sound, are suboptimal from the performance perspective because of their limitations in exploiting parallelism in modern distributed compute settings. In particular, for identifying top-K frequent items, Count-Min Sketch (CMS) has an excellent update time, but lacks the important property of reducibility which is needed for exploiting available massive data parallelism. On the other end, the popular Frequent algorithm (FA) leads to reducible summaries but its update costs are significant. In this paper, we present Topkapi, a fast and parallel algorithm for finding top-K frequent items, which gives the best of both worlds, i.e., it is reducible and has fast update time similar to CMS. Topkapi possesses strong theoretical guarantees and leads to significant performance gains due to increased parallelism, relative to past work.

上一篇:Convergence of Cubic Regularization for Nonconvex Optimization under K Property

下一篇:Learning from Group Comparisons: Exploiting Higher Order Interactions

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...