Awesome Papers on Learning to Hash | Awesome Learning to Hash Add your paper to Learning2Hash

Awesome Papers on Learning to Hash

🏷 Browse Papers by Tag

AAAI ACCV ACL AISTATS Arxiv BMVC CIKM CIVR CNN CVIU CVPR CVPRW Case Study Cross Modal Cross-Modal Dataset Deep Hashing Deep Learning ECCV ECIR FOCS GAN Graph Has Code ICCV ICIP ICLR ICME ICML ICMR ICPR IJCAI IJCV Image Retrieval JCSS KDD LSH LSTM MM Minhash NAACL NIPS NeurIPS Online Pattern Recognition Letters Quantisation RNN SCG SDM SIGIR SIGMOD Self-Supervised Semi-Supervised Skewed Data Spectral Streaming Data Supervised Survey Survey Paper TIP TKDE TNNLS TOIS TOM TOMM TPAMI Text Retrieval Unsupervised VLDB Video Retrieval WWW Weakly Supervised

The Field of Learning to Hash

This website exists to help researchers learn about, share and more easily discover recent work in the field of learning to hash. Awesome learning to hash is a living literature review that allows the reader to explore models in the field following a taxonomy based on model properties. New models can be added to this website by anyone simply by filling out a form see Contributing below. Select “All Papers” on the right-hand menu to get started.

Background

In general, Nearest neighbour search is the problem of finding the most similar data-points to a query in a large database of data-points. Nearest neighbour search is a fundamental operation in computer science that has found wide applicability in many fields, from Bioinformatics, through to Natural Language Processing (NLP) and Computer Vision. A selection of interesting and successful application areas include:

The list is truly endless! 

Learning2hash provides a set of curated, community-driven links to a host of approximate nearest neighbour search (ANN) models that permit sublinear-time retrieval of nearest neighbours. Hashing models work by generating similar binary hashcodes for semantically similar data-points (illustrated by the diagram below). These similarity preserving hashcodes can then be used to index the data-points (images, documents etc) into the buckets of hashtables. Similar data-points should ideally end up in the same bucket of the hash table if we happen to have an effective hash function. This whole process is illustrated in the diagram below:

Locality Sensitive Hashing (LSH)

Diagram taken from the PhD thesis of Sean Moran.

Given a query, such as the image of the tiger in the example above, we can search for similar data-points by generating a hashcode for the query and only comparing the query data-point to the data-points that collide with it in the same hashtable bucket (or buckets if we have multiple independent hashtables). Usually the number of data-points in the colliding hash table bucket(s) should be much less than the total number of data-points in the entire dataset, yielding a query time that is substantially improved over a simple brute-force search. The only downside to these models is they might not return the closest nearest neighbour(s) every time (that is, we forgo a degree of accuracy), which is usually an acceptable trade-off in practice.

Much more introductory material on hashing and learning-to-hash can be found in Resources.

This literature review gives a particular focus on recently proposed approximate nearest neighbour search models that improve retrieval effectiveness by learning task specific hashcodes that are sensitive to the distribution of the data. All the papers listed in this review website suggest new methods for learning the hash functions to generate more effective similarity preserving hashcodes.

Contributing

Learning to hash is rapidly evolving. The purpose of this website is to augment static literature reviews with a dynamic website that can be updated by any interested researcher with newly published work in the field, including links to relevant code and datasets. To add a new paper to the website simply create a markdown file and open a pull request in GitHub by following these instructions for contributing.

Copyright © Sean Moran 2024. All opinions are my own.