Dynamic Enumeration Of Similarity Joins
Agarwal Pankaj K., Hu Xiao, Sintos Stavros, Yang Jun. Arxiv 2021
[Paper]
ARXIV
Independent
LSH
This paper considers enumerating answers to similarity-join queries under
dynamic updates: Given two sets of points in , a metric
, and a distance threshold , report all pairs of points
with . Our goal is to store into
a dynamic data structure that, whenever asked, can enumerate all result pairs
with worst-case delay guarantee, i.e., the time between enumerating two
consecutive pairs is bounded. Furthermore, the data structure can be
efficiently updated when a point is inserted into or deleted from or .
We propose several efficient data structures for answering similarity-join
queries in low dimension. For exact enumeration of similarity join, we present
near-linear-size data structures for metrics with
update time and delay. We show that such a data structure is
not feasible for the metric for . For approximate enumeration
of similarity join, where the distance threshold is a soft constraint, we
obtain a unified linear-size data structure for metric, with
delay and update time. In high dimensions, we present an
efficient data structure with worst-case delay-guarantee using locality
sensitive hashing (LSH).
Similar Work