Entropy Based Nearest Neighbor Search In High Dimensions
Panigrahy Rina. Arxiv 2005
[Paper]
ARXIV
Independent
In this paper we study the problem of finding the approximate nearest
neighbor of a query point in the high dimensional space, focusing on the
Euclidean space. The earlier approaches use locality-preserving hash functions
(that tend to map nearby points to the same value) to construct several hash
tables to ensure that the query point hashes to the same bucket as its nearest
neighbor in at least one table. Our approach is different – we use one (or a
few) hash table and hash several randomly chosen points in the neighborhood of
the query point showing that at least one of them will hash to the bucket
containing its nearest neighbor. We show that the number of randomly chosen
points in the neighborhood of the query point required depends on the
entropy of the hash value of a random point at the same distance
from at its nearest neighbor, given and the locality preserving hash
function chosen randomly from the hash family. Precisely, we show that if
the entropy and is a bound on the probability that two
far-off points will hash to the same bucket, then we can find the approximate
nearest neighbor in time and near linear space where
. Alternatively we can build a data structure of size
to answer queries in time. By applying
this analysis to the locality preserving hash functions in and adjusting the
parameters we show that the nearest neighbor can be computed in time
and near linear space where as
becomes large.
Similar Work