Near Neighbor: Who is the Fairest of Them All?

资源分类

2020-02-19 |

95 |

104 |

Abstract

In this work we study a fair variant of the near neighbor problem. Namely, given a set of n points P and a parameter r, the goal is to preprocess the points, such that given a query point q, any point in the r-neighborhood of the query, i.e., 图片.png , rq, have the same probability of being reported as the near neighbor. We show that LSH based algorithms can be made fair, without a significant loss in efficiency. Specifically, we show an algorithm that reports a point in the rneighborhood of a query q with almost uniform probability. The query time is proportional to 图片.png , and its space is , where , cq and are the query time and space of an LSH algorithm for c-approximate near neighbor, and , is a function of the local density around q. Our approach works more generally for sampling uniformly from a sub-collection of sets of a given collection and can be used in a few other applications. Finally, we run experiments to show performance of our approach on real data.