[Paper]
Edit distance similarity search, also called approximate pattern matching, is a fundamental problem with widespread database applications. The goal of the problem is to preprocess (n) strings of length (d), to quickly answer queries (q) of the form: if there is a database string within edit distance (r) of (q), return a database string within edit distance (cr) of (q). Previous approaches to this problem either rely on very large (superconstant) approximation ratios (c), or very small search radii (r). Outside of a narrow parameter range, these solutions are not competitive with trivially searching through all (n) strings. In this work give a simple and easy-to-implement hash function that can quickly answer queries for a wide range of parameters. Specifically, our strategy can answer queries in time (\tilde{O}(d3^rn^{1/c})). The best known practical results require (c \gg r) to achieve any correctness guarantee; meanwhile, the best known theoretical results are very involved and difficult to implement, and require query time at least (24^r). Our results significantly broaden the range of parameters for which we can achieve nontrivial bounds, while retaining the practicality of a locality-sensitive hash function. We also show how to apply our ideas to the closely-related Approximate Nearest Neighbor problem for edit distance, obtaining similar time bounds.