Dynamic Enumeration Of Similarity Joins

Agarwal Pankaj K., Hu Xiao, Sintos Stavros, Yang Jun. Arxiv 2021

This paper considers enumerating answers to similarity-join queries under dynamic updates: Given two sets of $n$ points $A, B$ in $R^{d}$ , a metric $ϕ (\cdot)$ , and a distance threshold $r > 0$ , report all pairs of points $(a, b) \in A \times B$ with $ϕ (a, b) \leq r$ . Our goal is to store $A, B$ into a dynamic data structure that, whenever asked, can enumerate all result pairs with worst-case delay guarantee, i.e., the time between enumerating two consecutive pairs is bounded. Furthermore, the data structure can be efficiently updated when a point is inserted into or deleted from $A$ or $B$ . We propose several efficient data structures for answering similarity-join queries in low dimension. For exact enumeration of similarity join, we present near-linear-size data structures for $ℓ_{1}, ℓ_{\infty}$ metrics with $l o g^{O (1)} n$ update time and delay. We show that such a data structure is not feasible for the $ℓ ₂$ metric for $d \geq 4$ . For approximate enumeration of similarity join, where the distance threshold is a soft constraint, we obtain a unified linear-size data structure for $ℓ_{p}$ metric, with $l o g^{O (1)} n$ delay and update time. In high dimensions, we present an efficient data structure with worst-case delay-guarantee using locality sensitive hashing (LSH).

Awesome Learning to Hash

Dynamic Enumeration Of Similarity Joins

Agarwal Pankaj K., Hu Xiao, Sintos Stavros, Yang Jun. Arxiv 2021

Similar Work