Optimal Hashing-based Time-space Trade-offs For Approximate Near Neighbors

Andoni Alexandr, Laarhoven Thijs, Razenshteyn Ilya, Waingarten Erik. Arxiv 2016

[See the paper for the full abstract.] We show tight upper and lower bounds for time-space trade-offs for the $c$ -Approximate Near Neighbor Search problem. For the $d$ -dimensional Euclidean space and $n$ -point datasets, we develop a data structure with space $n^{1 + ρ_{u} + o (1)} + O (d n)$ and query time $n^{ρ_{q} + o (1)} + d n^{o (1)}$ for every $ρ_{u}, ρ_{q} \geq 0$ such that: $c^{2} \sqrt{ρ_{q}} + (c^{2} - 1) \sqrt{ρ_{u}} = \sqrt{2 c^{2} - 1} .$ This is the first data structure that achieves sublinear query time and near-linear space for every approximation factor $c > 1$ , improving upon [Kapralov, PODS 2015]. The data structure is a culmination of a long line of work on the problem for all space regimes; it builds on Spherical Locality-Sensitive Filtering [Becker, Ducas, Gama, Laarhoven, SODA 2016] and data-dependent hashing [Andoni, Indyk, Nguyen, Razenshteyn, SODA 2014] [Andoni, Razenshteyn, STOC 2015]. Our matching lower bounds are of two types: conditional and unconditional. First, we prove tightness of the whole above trade-off in a restricted model of computation, which captures all known hashing-based approaches. We then show unconditional cell-probe lower bounds for one and two probes that match the above trade-off for $ρ_{q} = 0$ , improving upon the best known lower bounds from [Panigrahy, Talwar, Wieder, FOCS 2010]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than the one-probe bound. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.

Awesome Learning to Hash

Optimal Hashing-based Time-space Trade-offs For Approximate Near Neighbors

Andoni Alexandr, Laarhoven Thijs, Razenshteyn Ilya, Waingarten Erik. Arxiv 2016

Similar Work