Near Neighbor Who Is The Fairest Of Them All

Sariel Har-peled, Sepideh Mahabadi. Neural Information Processing Systems 2019

In this work we study a “fair” variant of the near neighbor problem. Namely, given a set of $n$ points $P$ and a parameter $r$ , the goal is to preprocess the points, such that given a query point $q$ , any point in the $r$ -neighborhood of the query, i.e., $B (q, r)$ , have the same probability of being reported as the near neighbor.

We show that LSH based algorithms can be made fair, without a significant loss in efficiency. Specifically, we show an algorithm that reports a point $p$ in the $r$ -neighborhood of a query $q$ with almost uniform probability. The time to report such a point is proportional to $O (\dns (q . r) Q (n, c))$ , and its space is $O (S (n, c))$ , where $Q (n, c)$ and $S (n, c)$ are the query time and space of an LSH algorithm for $c$ -approximate near neighbor, and $\dns (q, r)$ is a function of the local density around $q$ .

Our approach works more generally for sampling uniformly from a sub-collection of sets of a given collection and can be used in a few other applications. Finally, we run experiments to show performance of our approach on real data.

Awesome Learning to Hash

Near Neighbor Who Is The Fairest Of Them All

Sariel Har-peled, Sepideh Mahabadi. Neural Information Processing Systems 2019

Similar Work