Dynamic Enumeration Of Similarity Joins | Awesome Learning to Hash Add your paper to Learning2Hash

Dynamic Enumeration Of Similarity Joins

Pankaj K. Agarwal, Xiao Hu, Stavros Sintos, Jun Yang . Arxiv 2021 – 2 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Hashing Methods Locality-Sensitive-Hashing

This paper considers enumerating answers to similarity-join queries under dynamic updates: Given two sets of (n) points (A,B) in (\mathbb{R}^d), a metric (\phi(\cdot)), and a distance threshold (r > 0), report all pairs of points ((a, b) \in A \times B) with (\phi(a,b) \le r). Our goal is to store (A,B) into a dynamic data structure that, whenever asked, can enumerate all result pairs with worst-case delay guarantee, i.e., the time between enumerating two consecutive pairs is bounded. Furthermore, the data structure can be efficiently updated when a point is inserted into or deleted from (A) or (B). We propose several efficient data structures for answering similarity-join queries in low dimension. For exact enumeration of similarity join, we present near-linear-size data structures for (\ell_1, \ell_\infty) metrics with (log^{O(1)} n) update time and delay. We show that such a data structure is not feasible for the (ℓ₂) metric for (d \ge 4). For approximate enumeration of similarity join, where the distance threshold is a soft constraint, we obtain a unified linear-size data structure for (\ell_p) metric, with (log^{O(1)} n) delay and update time. In high dimensions, we present an efficient data structure with worst-case delay-guarantee using locality sensitive hashing (LSH).

Similar Work