Local Density Estimation In High Dimensions
Wu Xian, Charikar Moses, Natchu Vishnu. Arxiv 2018
[Paper]
ARXIV
Independent
LSH
An important question that arises in the study of high dimensional vector
representations learned from data is: given a set of vectors and
a query , estimate the number of points within a specified distance
threshold of . We develop two estimators, LSH Count and Multi-Probe Count
that use locality sensitive hashing to preprocess the data to accurately and
efficiently estimate the answers to such questions via importance sampling. A
key innovation is the ability to maintain a small number of hash tables via
preprocessing data structures and algorithms that sample from multiple buckets
in each hash table. We give bounds on the space requirements and sample
complexity of our schemes, and demonstrate their effectiveness in experiments
on a standard word embedding dataset.
Similar Work