Awesome Papers on Learning to Hash

🌐 Check Out Our Sister Site on Large Language Models

Explore our related resource, focusing on Large Language Models, at LLM Bible.

🏷 Browse Papers by Tag

Explore the latest research by browsing papers categorized by tags. Select a tag below to dive deeper into specific topics within the field of learning to hash:

AAAI ARXIV CNN COLT Case Study Cross Modal Dataset Deep Learning FOCS GAN Graph Has Code ICIP ICML Image Retrieval Independent LSH NEURIPS Quantisation SIGIR Self Supervised Streaming Data Supervised Survey Paper TMLR Text Retrieval Theory Unsupervised Video Retrieval Weakly Supervised

Understanding Learning to Hash

This website is a resource for researchers looking to explore, share, and discover recent advancements in the field of learning to hash. It serves as a living literature review, allowing readers to navigate models organized by a taxonomy based on key properties. Anyone can contribute to this growing resource by submitting new papers via a simple form. For details, see the Contributing section.

To start, visit the “All Papers” section from the right-hand menu and browse the full list of contributions.

Background: What is Learning to Hash?

At its core, Nearest Neighbour Search is the task of finding the most similar data points to a given query in a large dataset. This operation is fundamental to many fields, from Bioinformatics to Natural Language Processing (NLP) and Computer Vision.

Some notable applications include:

Scalable Source Code Search: Using MinHash to enable code-to-code recommendations across large-scale source code repositories.
Efficient Transformers: Locality Sensitive Hashing (LSH) helps make large Transformer models more efficient, cutting down training costs.
Social Media Event Tracking: A system for detecting and tracking interesting events on social media in real time using LSH.
Earthquake Detection: Comparing time series of seismic activity to detect earthquakes using LSH.
Fraud Detection at Uber: Uber employs LSH to identify suspicious taxi rides based on spatial data.
Audio Fingerprinting: Matching a snippet of audio to a large database (think Shazam!) using learning-to-hash methods.
Genomic Research: Biologists use LSH to assemble genomes and find genes with similar expression profiles.
Image Retrieval: Google applies LSH along with PageRank to index massive collections of images.
Malware Detection: Hash learning models help antivirus software quickly match code snippets to known viruses.

How Learning to Hash Works

Learning to hash is about creating binary hash codes that capture the similarity between data points. These hash codes are then used to index data into hash tables, making it possible to quickly find similar items based on the query.

For example, in the image below, the system generates a hashcode for an image of a tiger and compares it only to data points within the same hash table bucket. This method dramatically reduces the number of comparisons needed, making search faster than brute-force approaches. Although there’s a small trade-off in accuracy, the speed benefits are substantial in practice.

Locality Sensitive Hashing (LSH)

Image from the PhD thesis of Sean Moran.

For more detailed introductory material, visit our Resources page.

Contribute to the Growing Research

The field of learning to hash is rapidly evolving, and this website aims to stay current by inviting contributions from researchers. If you come across new work in this area, you can easily add it by creating a markdown file and submitting a pull request through our GitHub page. For full instructions, visit the Contributing section.

Awesome Learning to Hash