Binary Coding In Stream | Awesome Learning to Hash Add your paper to Learning2Hash

Binary Coding In Stream

Ghashami Mina, Abdullah Amirali. Arxiv 2015

[Paper]    
ARXIV Unsupervised

Big data is becoming ever more ubiquitous, ranging over massive video repositories, document corpuses, image sets and Internet routing history. Proximity search and clustering are two algorithmic primitives fundamental to data analysis, but suffer from the “curse of dimensionality” on these gigantic datasets. A popular attack for this problem is to convert object representations into short binary codewords, while approximately preserving near neighbor structure. However, there has been limited research on constructing codewords in the “streaming” or “online” settings often applicable to this scale of data, where one may only make a single pass over data too massive to fit in local memory. In this paper, we apply recent advances in matrix sketching techniques to construct binary codewords in both streaming and online setting. Our experimental results compete outperform several of the most popularly used algorithms, and we prove theoretical guarantees on performance in the streaming setting under mild assumptions on the data and randomness of the training set.

Similar Work