Let Them Have CAKES A Cutting-edge Algorithm For Scalable Efficient And Exact Search On Big Data
Prior Morgan E., Howard Thomas J. Iii, Mclaughlin Oliver, Ferguson Terrence, Ishaq Najib, Daniels Noah M.. Arxiv 2023
[Paper]
ARXIV
The ongoing Big Data explosion has created a demand for efficient and
scalable algorithms for similarity search.
Most recent work has focused on \textit{approximate} -NN search, and while
this may be sufficient for some applications, \textit{exact} -NN search
would be ideal for many applications.
We present CAKES, a set of three novel, exact algorithms for -NN search.
CAKES’s algorithms are generic over \textit{any} distance function, and they
\textit{do not} scale with the cardinality or embedding dimension of the
dataset, but rather with its metric entropy and fractal dimension.
We test these claims on datasets from the ANN-Benchmarks suite under
commonly-used distance functions, as well as on a genomic dataset with
Levenshtein distance and a radio-frequency dataset with Dynamic Time Warping
distance.
We demonstrate that CAKES exhibits near-constant scaling with cardinality on
data conforming to the manifold hypothesis, and has perfect recall on data in
\textit{metric} spaces.
We also demonstrate that CAKES exhibits significantly higher recall than
state-of-the-art -NN search algorithms when the distance function is not a
metric.
Additionally, we show that indexing and tuning time for CAKES is an order of
magnitude, or more, faster than state-of-the-art approaches.
We conclude that CAKES is a highly efficient and scalable algorithm for exact
-NN search on Big Data.
We provide a Rust implementation of CAKES.
Similar Work