Sampling A Near Neighbor In High Dimensions -- Who Is The Fairest Of Them All
Aumüller Martin, Har-peled Sariel, Mahabadi Sepideh, Pagh Rasmus, Silvestri Francesco. Arxiv 2021
[Paper]
ARXIV
Independent
LSH
Similarity search is a fundamental algorithmic primitive, widely used in many
computer science disciplines. Given a set of points and a radius parameter
, the -near neighbor (-NN) problem asks for a data structure that,
given any query point , returns a point within distance at most from
. In this paper, we study the -NN problem in the light of individual
fairness and providing equal opportunities: all points that are within distance
from the query should have the same probability to be returned. In the
low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS
2014). Locality sensitive hashing (LSH), the theoretically strongest approach
to similarity search in high dimensions, does not provide such a fairness
guarantee. In this work, we show that LSH based algorithms can be made fair,
without a significant loss in efficiency. We propose several efficient data
structures for the exact and approximate variants of the fair NN problem. Our
approach works more generally for sampling uniformly from a sub-collection of
sets of a given collection and can be used in a few other applications. We also
develop a data structure for fair similarity search under inner product that
requires nearly-linear space and exploits locality sensitive filters. The paper
concludes with an experimental evaluation that highlights the inherent
unfairness of NN data structures and shows the performance of our algorithms on
real-world datasets.
Similar Work