Improving LSH Via Tensorized Random Projection
Verma Bhisham Dev, Pratap Rameshwar. Arxiv 2024
[Paper]
ARXIV
Independent
LSH
Locality sensitive hashing (LSH) is a fundamental algorithmic toolkit used by
data scientists for approximate nearest neighbour search problems that have
been used extensively in many large scale data processing applications such as
near duplicate detection, nearest neighbour search, clustering, etc. In this
work, we aim to propose faster and space efficient locality sensitive hash
functions for Euclidean distance and cosine similarity for tensor data.
Typically, the naive approach for obtaining LSH for tensor data involves first
reshaping the tensor into vectors, followed by applying existing LSH methods
for vector data and . However, this approach becomes impractical
for higher order tensors because the size of the reshaped vector becomes
exponential in the order of the tensor. Consequently, the size of LSH
parameters increases exponentially. To address this problem, we suggest two
methods for LSH for Euclidean distance and cosine similarity, namely
, , and , , respectively, building on
and tensor train decompositions techniques. Our approaches are space
efficient and can be efficiently applied to low rank or tensors. We
provide a rigorous theoretical analysis of our proposal on their correctness
and efficacy.
Similar Work