On The Problem Of p_1^-1 In Locality-sensitive Hashing
Ahle Thomas Dybdahl. Arxiv 2020
[Paper]
ARXIV
Independent
LSH
A Locality-Sensitive Hash (LSH) function is called
-sensitive, if two data-points with a distance less than
collide with probability at least while data points with a distance
greater than collide with probability at most . These functions form
the basis of the successful Indyk-Motwani algorithm (STOC 1998) for nearest
neighbour problems. In particular one may build a -approximate nearest
neighbour data structure with query time where
. That is, sub-linear time, as long
as is not too small. This is significant since most high dimensional
nearest neighbour problems suffer from the curse of dimensionality, and can’t
be solved exact, faster than a brute force linear-time scan of the database.
Unfortunately, the best LSH functions tend to have very low collision
probabilities, and . Including the best functions for Cosine and
Jaccard Similarity. This means that the query time of LSH is often
not sub-linear after all, even for approximate nearest neighbours!
In this paper, we improve the general Indyk-Motwani algorithm to reduce the
query time of LSH to (and the space usage
correspondingly.) Since ,
our algorithm always obtains sublinear query time, for any collision
probabilities at least . For and small enough, our improvement
over all previous methods can be up to a factor in both query time
and space.
The improvement comes from a simple change to the Indyk-Motwani algorithm,
which can easily be implemented in existing software packages.
Similar Work