On The Problem Of p_1^-1 In Locality-sensitive Hashing | Awesome Learning to Hash Add your paper to Learning2Hash

On The Problem Of p_1^-1 In Locality-sensitive Hashing

Ahle Thomas Dybdahl. Arxiv 2020

[Paper]    
ARXIV Independent LSH

A Locality-Sensitive Hash (LSH) function is called (r,cr,p1,p2)-sensitive, if two data-points with a distance less than r collide with probability at least p1 while data points with a distance greater than cr collide with probability at most p2. These functions form the basis of the successful Indyk-Motwani algorithm (STOC 1998) for nearest neighbour problems. In particular one may build a c-approximate nearest neighbour data structure with query time O~(nρ/p1) where ρ=log1/p1log1/p2(0,1). That is, sub-linear time, as long as p1 is not too small. This is significant since most high dimensional nearest neighbour problems suffer from the curse of dimensionality, and can’t be solved exact, faster than a brute force linear-time scan of the database. Unfortunately, the best LSH functions tend to have very low collision probabilities, p1 and p2. Including the best functions for Cosine and Jaccard Similarity. This means that the nρ/p1 query time of LSH is often not sub-linear after all, even for approximate nearest neighbours! In this paper, we improve the general Indyk-Motwani algorithm to reduce the query time of LSH to O~(nρ/p11ρ) (and the space usage correspondingly.) Since nρp1ρ1<np1>n1, our algorithm always obtains sublinear query time, for any collision probabilities at least 1/n. For p1 and p2 small enough, our improvement over all previous methods can be up to a factor n in both query time and space. The improvement comes from a simple change to the Indyk-Motwani algorithm, which can easily be implemented in existing software packages.

Similar Work