Fair Near Neighbor Search Independent Range Sampling In High Dimensions
Aumüller Martin, Pagh Rasmus, Silvestri Francesco. Arxiv 2019
[Paper]
ARXIV
Independent
LSH
Similarity search is a fundamental algorithmic primitive, widely used in many
computer science disciplines. There are several variants of the similarity
search problem, and one of the most relevant is the -near neighbor (-NN)
problem: given a radius and a set of points , construct a data
structure that, for any given query point , returns a point within
distance at most from . In this paper, we study the -NN problem in
the light of fairness. We consider fairness in the sense of equal opportunity:
all points that are within distance from the query should have the same
probability to be returned. In the low-dimensional case, this problem was first
studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the
theoretically strongest approach to similarity search in high dimensions, does
not provide such a fairness guarantee. To address this, we propose efficient
data structures for -NN where all points in that are near have the
same probability to be selected and returned by the query. Specifically, we
first propose a black-box approach that, given any LSH scheme, constructs a
data structure for uniformly sampling points in the neighborhood of a query.
Then, we develop a data structure for fair similarity search under inner
product that requires nearly-linear space and exploits locality sensitive
filters. The paper concludes with an experimental evaluation that highlights
(un)fairness in a recommendation setting on real-world datasets and discusses
the inherent unfairness introduced by solving other variants of the problem.
Similar Work