Improved Fast Similarity Search In Dictionaries

Karch Daniel, Luxen Dennis, Sanders Peter. Arxiv 2010

We engineer an algorithm to solve the approximate dictionary matching problem. Given a list of words $W$ , maximum distance $d$ fixed at preprocessing time and a query word $q$ , we would like to retrieve all words from $W$ that can be transformed into $q$ with $d$ or less edit operations. We present data structures that support fault tolerant queries by generating an index. On top of that, we present a generalization of the method that eases memory consumption and preprocessing time significantly. At the same time, running times of queries are virtually unaffected. We are able to match in lists of hundreds of thousands of words and beyond within microseconds for reasonable distances.

Similar Work

Word2bits - Quantized Word Vectors
Hash2vec Feature Hashing For Word Embeddings
Multi Hash Embeddings In Spacy
On Tight Bounds For Binary Frameproof Codes

Awesome Learning to Hash

Improved Fast Similarity Search In Dictionaries

Karch Daniel, Luxen Dennis, Sanders Peter. Arxiv 2010

Similar Work