资源论文One Permutation Hashing

One Permutation Hashing

2020-01-13 | |  127 |   104 |   0

Abstract

Minwise hashing is a standard procedure in the context of search, for efficiently estimating set similarities in massive binary data such as text. Recently, b-bit minwise hashing has been applied to large-scale learning and sublinear time nearneighbor search. The major drawback of minwise hashing is the expensive preprocessing, as the method requires applying (e.g.,) k = 200 to 500 permutations on the data. This paper presents a simple solution called one permutation hashing. Conceptually, given a binary data matrix, we permute the columns once and divide the permuted columns evenly into k bins; and we store, for each data vector, the smallest nonzero location in each bin. The probability analysis illustrates that this one permutation scheme should perform similarly to the original (k-permutation) minwise hashing. Our experiments with training SVM and logistic regression confirm that one permutation hashing can achieve similar (or even better) accuracies compared to the k-permutation scheme. See more details in arXiv:1208.1259.

上一篇:Fast Resampling Weighted v-Statistics

下一篇:Density Propagation and Improved Bounds on the Partition Function

用户评价
全部评价

热门资源

  • Regularizing RNNs...

    Recently, caption generation with an encoder-de...

  • Deep Cross-media ...

    Cross-media retrieval is a research hotspot in ...

  • Supervised Descen...

    Many computer vision problems (e.

  • Learning Expressi...

    Facial expression is temporally dynamic event w...

  • Attributed Graph ...

    Graph clustering is a fundamental task which di...