Data-dependent LSH For The Earth Movers Distance
Jayaram Rajesh, Waingarten Erik, Zhang Tian. Arxiv 2024
[Paper]
ARXIV
Independent
LSH
We give new data-dependent locality sensitive hashing schemes (LSH) for the
Earth Mover’s Distance (), and as a result, improve the best
approximation for nearest neighbor search under by a quadratic
factor. Here, the metric consists of sets
of vectors in , and for any two sets of vectors the
distance is the minimum cost of a perfect matching between
, where the cost of matching two vectors is their distance.
Previously, Andoni, Indyk, and Krauthgamer gave a (data-independent)
locality-sensitive hashing scheme for
when with approximation . By being data-dependent,
we improve the approximation to .
Our main technical contribution is to show that for any distribution
supported on the metric , there exists a
data-dependent LSH for dense regions of which achieves approximation
, and that the data-independent LSH actually achieves a
-approximation outside of those dense regions. Finally, we
show how to “glue” together these two hashing schemes without any additional
loss in the approximation.
Beyond nearest neighbor search, our data-dependent LSH also gives optimal
(distributional) sketches for the Earth Mover’s Distance. By known sketching
lower bounds, this implies that our LSH is optimal (up to factors) among those that collide close points with constant
probability.
Similar Work