资源论文Optimal Densification for Fast and Accurate Minwise Hashing

Optimal Densification for Fast and Accurate Minwise Hashing

2020-03-10 | |  58 |   44 |   0

Abstract

Minwise hashing is a fundamental and one of the most successful hashing algorithm in the literature. Recent advances based on the idea of densification (Shrivastava & Li, 2014a;c) have shown that it is possible to compute k minwise hashes, of a vector with d nonzeros, in mere (d + k) computations, a significant improvement over the classical O(dk). These advances have led to an algorithmic improvement in the query complexity of traditional indexing algorithms based on minwise hashing. Unfortunately, the variance of the current densification techniques is unnecessarily high, which leads to significantly poor accuracy compared to vanilla minwise hashing, especially when the data is sparse. In this paper, we provide a novel densification scheme which relies on carefully tailored 2-universal hashes. We show that the proposed scheme is variance-optimal, and without losing the runtime efficiency, it is significantly more a curate than existing densification techniques. As a result, we obtain a significantly efficient hash ing scheme which has the same variance and collision probability as minwise hashing. Experimental evaluations on real sparse and highdimensional datasets validate our claims. We believe that given the significant advantages, our method will replace minwise hashing implementations in practice.

上一篇:Adaptive Consensus ADMM for Distributed Optimization

下一篇:On orthogonality and learning recurrent networks with long term dependencies

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...