Approximate Similarity Search Under Edit Distance Using Locality-sensitive Hashing
Mccauley Samuel. Arxiv 2019
[Paper]
ARXIV
Independent
Edit distance similarity search, also called approximate pattern matching, is
a fundamental problem with widespread database applications. The goal of the
problem is to preprocess strings of length , to quickly answer queries
of the form: if there is a database string within edit distance of ,
return a database string within edit distance of . Previous approaches
to this problem either rely on very large (superconstant) approximation ratios
, or very small search radii . Outside of a narrow parameter range, these
solutions are not competitive with trivially searching through all strings.
In this work give a simple and easy-to-implement hash function that can
quickly answer queries for a wide range of parameters. Specifically, our
strategy can answer queries in time . The best known
practical results require to achieve any correctness guarantee;
meanwhile, the best known theoretical results are very involved and difficult
to implement, and require query time at least . Our results significantly
broaden the range of parameters for which we can achieve nontrivial bounds,
while retaining the practicality of a locality-sensitive hash function.
We also show how to apply our ideas to the closely-related Approximate
Nearest Neighbor problem for edit distance, obtaining similar time bounds.
Similar Work