Multi-resolution Hashing For Fast Pairwise Summations
Charikar Moses, Siminelakis Paris. Arxiv 2018
[Paper]
ARXIV
FOCS
Independent
A basic computational primitive in the analysis of massive datasets is
summing simple functions over a large number of objects. Modern applications
pose an additional challenge in that such functions often depend on a parameter
vector (query) that is unknown a priori. Given a set of points and a pairwise function , we study the problem of designing a data-structure
that enables sublinear-time approximation of the summation
for any query . By combining ideas from Harmonic Analysis (partitions of unity
and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis
FOCS’17], we provide a general framework for designing such data structures
through hashing that reaches far beyond what previous techniques allowed.
A key design principle is a collection of hashing schemes with
collision probabilities such that . This leads to a data-structure
that approximates using a sub-linear number of samples from each
hash family. Using this new framework along with Distance Sensitive Hashing
[Aumuller, Christiani, Pagh, Silvestri PODS’18], we show that such a collection
can be constructed and evaluated efficiently for any log-convex function
of the inner product on the unit sphere
.
Our method leads to data structures with sub-linear query time that
significantly improve upon random sampling and can be used for Kernel Density
or Partition Function Estimation. We provide extensions of our result from the
sphere to and from scalar functions to vector functions.
Similar Work