Search across all paper titles, abstracts, authors by using the search field. Please consider contributing by updating the information of existing papers or adding new work.
Year  Title  Authors  Venue  Abstract 

2020  Targeted Attack for Deep Hashing based Retrieval  Jiawang Bai, Bin Chen, Yiming Li, Dongxian Wu, Weiwei Guo, Shutao Xia, Enhui Yang  Arxiv  The deep hashing based retrieval method is widely adopted in largescale image and video retrieval. However, there is little investigation on its security. In this paper, we propose a novel method, dubbed deep hashing targeted attack (DHTA), to study the targeted attack on such retrieval. Specifically, we first formulate the targeted attack as a pointtoset optimization, which minimizes the average distance between the hash code of an adversarial example and those of a set of objects with the target label. Then we design a novel componentvoting scheme to obtain an anchor code as the representative of the set of hash codes of objects with the target label, whose optimality guarantee is also theoretically derived. To balance the performance and perceptibility, we propose to minimize the Hamming distance between the hash code of the adversarial example and the anchor code under the ℓ∞ restriction on the perturbation. Extensive experiments verify that DHTA is effective in attacking both deep hashing based image retrieval and video retrieval. 
2020  ExchNet: A Unified Hashing Network for LargeScale FineGrained Image Retrieval  Quan Cui, QingYuan Jiang, XiuShen Wei, WuJun Li, and Osamu Yoshie  ECCV  Retrieving content relevant images from a largescale fine grained dataset could suffer from intolerably slow query speed and highly redundant storage cost, due to highdimensional realvalued embeddings which aim to distinguish subtle visual differences of finegrained objects. In this paper, we study the novel finegrained hashing topic to generate compact binary codes for finegrained images, leveraging the search and storage efficiency of hash learning to alleviate the aforementioned problems. Specifically, we propose a unified endtoend trainable network, termed as ExchNet. Based on attention mechanisms and proposed attention constraints, it can firstly obtain both local and global features to represent object parts and whole finegrained objects, respectively. Furthermore, to ensure the discriminative ability and semantic meaning’s consistency of these partlevel features across images, we design a local feature alignment approach by performing a feature exchanging operation. Later, an alternative learning algorithm is employed to optimize the whole ExchNet and then generate the final binary hash codes. Validated by extensive experiments, our proposal consistently outperforms stateoftheart generic hashing methods on five finegrained datasets, which shows our effectiveness. Moreover, compared with other approximate nearest neighbor methods, ExchNet achieves the best speedup and storage reduction, revealing its efficiency and practicality. 
2020  Contentaware Neural Hashing for Coldstart Recommendation  Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, Christina Lioma  SIGIR  Contentaware recommendation approaches are essential for providing meaningful recommendations for new (i.e., coldstart) items in a recommender system. We present a contentaware neural hashingbased collaborative filtering approach (NeuHashCF), which generates binary hash codes for users and items, such that the highly efficient Hamming distance can be used for estimating useritem relevance. NeuHashCF is modelled as an autoencoder architecture, consisting of two joint hashing components for generating user and item hash codes. Inspired from semantic hashing, the item hashing component generates a hash code directly from an item’s content information (i.e., it generates coldstart and seen item hash codes in the same manner). This contrasts existing stateoftheart models, which treat the two item cases separately. The user hash codes are generated directly based on user id, through learning a user embedding matrix. We show experimentally that NeuHashCF significantly outperforms stateoftheart baselines by up to 12% NDCG and 13% MRR in coldstart recommendation settings, and up to 4% in both NDCG and MRR in standard settings where all items are present while training. Our approach uses 24x shorter hash codes, while obtaining the same or better performance compared to the state of the art, thus consequently also enabling a notable storage reduction. 
2020  Creating Something from Nothing: Unsupervised Knowledge Distillation for CrossModal Hashing  Hengtong Hu, Lingxi Xie, Richang Hong, Qi Tian  CVPR  In recent years, crossmodal hashing (CMH) has attracted increasing attentions, mainly because its potential ability of mapping contents from different modalities, especially in vision and language, into the same space, so that it becomes efficient in crossmodal data retrieval. There are two main frameworks for CMH, differing from each other in whether semantic supervision is required. Compared to the unsupervised methods, the supervised methods often enjoy more accurate results, but require much heavier labors in data annotation. In this paper, we propose a novel approach that enables guiding a supervised method using outputs produced by an unsupervised method. Specifically, we make use of teacherstudent optimization for propagating knowledge. Experiments are performed on two popular CMH benchmarks, i.e., the MIRFlickr and NUSWIDE datasets. Our approach outperforms all existing unsupervised methods by a large margin 
2020  Model Optimization Boosting Framework for Linear Model Hash Learning  Xingbo Liu, Xiushan Nie, Quan Zhou, Liqiang Nie, Yilong Yin  TIP  Efficient hashing techniques have attracted extensive research interests in both storage and retrieval of high dimensional data, such as images and videos. In existing hashing methods, a linear model is commonly utilized owing to its efficiency. To obtain better accuracy, linearbased hashing methods focus on designing a generalized linear objective function with different constraints or penalty terms that consider the inherent characteristics and neighborhood information of samples. Differing from existing hashing methods, in this study, we propose a selfimprovement framework called Model Boost (MoBoost) to improve model parameter optimization for linearbased hashing methods without adding new constraints or penalty terms. In the proposed MoBoost, for a linearbased hashing method, we first repeatedly execute the hashing method to obtain several hash codes to training samples. Then, utilizing two novel fusion strategies, these codes are fused into a single set. We also propose two new criteria to evaluate the goodness of hash bits during the fusion process. Based on the fused set of hash codes, we learn new parameters for the linear hash function that can significantly improve the accuracy. In general, the proposed MoBoost can be adopted by existing linearbased hashing methods, achieving more precise and stable performance compared to the original methods, and adopting the proposed MoBoost will incur negligible time and space costs. To evaluate the proposed MoBoost, we performed extensive experiments on four benchmark datasets, and the results demonstrate superior performance. 
2020  AutoEncoding TwinBottleneck Hashing  Yuming Shen, Jie Qin, Jiaxin Chen, Mengyang Yu, Li Liu, Fan Zhu, Fumin Shen, Ling Shao  CVPR  Conventional unsupervised hashing methods usually take advantage of similarity graphs, which are either precomputed in the highdimensional space or obtained from random anchor points. On the one hand, existing methods uncouple the procedures of hash function learning and graph construction. On the other hand, graphs empirically built upon original data could introduce biased prior knowledge of data relevance, leading to suboptimal retrieval performance. In this paper, we tackle the above problems by proposing an efficient and adaptive codedriven graph, which is updated by decoding in the context of an autoencoder. Specifically, we introduce into our framework twin bottlenecks (i.e., latent variables) that exchange crucial information collaboratively. One bottleneck (i.e., binary codes) conveys the highlevel intrinsic data structure captured by the codedriven graph to the other (i.e., continuous variables for lowlevel detail information), which in turn propagates the updated network feedback for the encoder to learn more discriminative binary codes. The autoencoding learning objective literally rewards the codedriven graph to learn an optimal encoder. Moreover, the proposed model can be simply optimized by gradient descent without violating the binary constraints. Experiments on benchmarked datasets clearly show the superiority of our framework over the stateoftheart hashing methods. 
2020  Central Similarity Hashing for Efficient Image and Video Retrieval  Li Yuan, Tao Wang, Xiaopeng Zhang, Zequn Jie, Francis EH Tay, Jiashi Feng  CVPR  Existing datadependent hashing methods usually learn hash functions from the pairwise or triplet data relationships, which only capture the data similarity locally, and often suffer low learning efficiency and low collision rate. In this work, we propose a new global similarity metric, termed as central similarity, with which the hash codes for similar data pairs are encouraged to approach a common center and those for dissimilar pairs to converge to different centers, to improve hash learning efficiency and retrieval accuracy. We principally formulate the computation of the proposed central similarity metric by introducing a new concept, i.e. hash center that refers to a set of data points scattered in the Hamming space with sufficient mutual distance between each other. We then provide an efficient method to construct well separated hash centers by leveraging the Hadamard matrix and Bernoulli distributions. Finally, we propose the Central Similarity Hashing (CSH) that optimizes the central similarity between data points w.r.t. their hash centers instead of optimizing the local similarity. The CSH is generic and applicable to both image and video hashing. Extensive experiments on largescale image and video retrieval demonstrate CSH can generate cohesive hash codes for similar data pairs and dispersed hash codes for dissimilar pairs, and achieve noticeable boost in retrieval performance, i.e. 3%20% in mAP over the previous stateoftheart. The codes are in: https://github.com/yuanli2333/ HadamardMatrixforhashing 
2019  Hashing with Mutual Information  F. Cakir, K. He, S. Bargal, S. Sclaroff  TPAMI  Binary vector embeddings enable fast nearest neighbor retrieval in large databases of highdimensional objects, and play an important role in many practical applications, such as image and video retrieval. We study the problem of learning binary vector embeddings under a supervised setting, also known as hashing. We propose a novel supervised hashing method based on optimizing an informationtheoretic quantity: mutual information. We show that optimizing mutual information can reduce ambiguity in the induced neighborhood structure in the learned Hamming space, which is essential in obtaining high retrieval performance. To this end, we optimize mutual information in deep neural networks with minibatch stochastic gradient descent, with a formulation that maximally and efficiently utilizes available supervision. Experiments on four image retrieval benchmarks, including ImageNet, confirm the effectiveness of our method in learning highquality binary embeddings for nearest neighbor retrieval. 
2019  Deep Semantic Text Hashing with Weak Supervision  Suthee Chaidaroon, Travis Ebesu, Yi Fang  SIGIR  With an ever increasing amount of data available on the web, fast similarity search has become the critical component for largescale information retrieval systems. One solution is semantic hashing which designs binary codes to accelerate similarity search. Recently, deep learning has been successfully applied to the semantic hashing problem and produces highquality compact binary codes compared to traditional methods. However, most stateoftheart semantic hashing approaches require large amounts of handlabeled training data which are often expensive and time consuming to collect. The cost of getting labeled data is the key bottleneck in deploying these hashing methods. Motivated by the recent success in machine learning that makes use of weak supervision, we employ unsupervised ranking methods such as BM25 to extract weak signals from training data. We further introduce two deep generative semantic hashing models to leverage weak signals for text hashing. The experimental results on four public datasets show that our models can generate highquality binary codes without using handlabeled training data and significantly outperform the competitive unsupervised semantic hashing baselines. 
2019  Deep Supervised Hashing With Anchor Graph  Yudong Chen, Zhihui Lai, Yujuan Ding, Kaiyi Lin, Wai Keung Wong  ICCV  Recently, a series of deep supervised hashing methods were proposed for binary code learning. However, due to the high computation cost and the limited hardware’s memory, these methods will first select a subset from the training set, and then form a minibatch data to update the network in each iteration. Therefore, the remaining labeled data cannot be fully utilized and the model cannot directly obtain the binary codes of the entire training set for retrieval. To address these problems, this paper proposes an interesting regularized deep model to seamlessly integrate the advantages of deep hashing and efficient binary code learning by using the anchor graph. As such, the deep features and label matrix can be jointly used to optimize the binary codes, and the network can obtain more discriminative feedback from the linear combinations of the learned bits. Moreover, we also reveal the algorithm mechanism and its computation essence. Experiments on three largescale datasets indicate that the proposed method achieves better retrieval performance with less training time compared to previous deep hashing methods. 
2019  LocalitySensitive Hashing for fDivergences: Mutual Information Loss and Beyond  L. Chen, H. Esfandiari, G. Fu, V. Mirrokni  NIPS  Computing approximate nearest neighbors in high dimensional spaces is a central problem in largescale data mining with a wide range of applications in machine learning and data science. A popular and effective technique in computing nearest neighbors approximately is the localitysensitive hashing (LSH) scheme. In this paper, we aim to develop LSH schemes for distance functions that measure the distance between two probability distributions, particularly for fdivergences as well as a generalization to capture mutual information loss. First, we provide a general framework to design LHS schemes for fdivergence distance functions and develop LSH schemes for the generalized JensenShannon divergence and triangular discrimination in this framework. We show a twosided approximation result for approximation of the generalized JensenShannon divergence by the Hellinger distance, which may be of independent interest. Next, we show a general method of reducing the problem of designing an LSH scheme for a Krein kernel (which can be expressed as the difference of two positive definite kernels) to the problem of maximum inner product search. We exemplify this method by applying it to the mutual information loss, due to its several important applications such as model compression. 
2019  A Twostep Crossmodal Hashing by Exploiting Label Correlations and Preserving Similarity in Both Steps  ZhenDuo Chen, Yongxin Wang, HuiQiong Li, Xin Luo, Liqiang Nie, XinShun  MM  In this paper, we present a novel TwostEp Crossmodal Hashing method, TECH for short, for crossmodal retrieval tasks. As a twostep method, it first learns hash codes based on semantic labels, while preserving the similarity in the original space and exploiting the label correlations in the label space. In the light of this, it is able to make better use of label information and generate better binary codes. In addition, different from other twostep methods that mainly focus on the hash codes learning, TECH adopts a new hash function learning strategy in the second step, which also preserves the similarity in the original space. Moreover, with the help of well designed objective function and optimization scheme, it is able to generate hash codes discretely and scalable for large scale data. To the best of our knowledge, it is the first crossmodal hashing method exploiting label correlations, and also the first twostep hashing model preserving the similarity while leaning hash function. Extensive experiments demonstrate that the proposed approach outperforms some stateoftheart crossmodal hashing methods. 
2019  Weakly Supervised Deep Image Hashing through Tag Embeddings  Vijetha Gattupalli, Yaoxin Zhuo, Baoxin Li  CVPR  Many approaches to semantic image hashing have been formulated as supervised learning problems that utilize images and label information to learn the binary hash codes. However, largescale labeled image data is expensive to obtain, thus imposing a restriction on the usage of such algorithms. On the other hand, unlabelled image data is abundant due to the existence of many Web image repositories. Such Web images may often come with images tags that contain useful information, although raw tags, in general, do not readily lead to semantic labels. Motivated by this scenario, we formulate the problem of semantic image hashing as a weaklysupervised learning problem. We utilize the information contained in the usergenerated tags associated with the images to learn the hash codes. More specifically, we extract the word2vec semantic embeddings of the tags and use the information contained in them for constraining the learning. Accordingly, we name our model Weakly Supervised Deep Hashing using Tag Embeddings (WDHT). WDHT is tested for the task of semantic image retrieval and is compared against several stateofart models. Results show that our approach sets a new stateofart in the area of weekly supervised image hashing. 
2019  Unsupervised Neural Generative Semantic Hashing  Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, Christina Lioma  SIGIR  Fast similarity search is a key component in largescale information retrieval, where semantic hashing has become a popular strategy for representing documents as binary hash codes. Recent advances in this area have been obtained through neural network based models: generative models trained by learning to reconstruct the original documents. We present a novel unsupervised generative semantic hashing approach, \textit{Ranking based Semantic Hashing} (RBSH) that consists of both a variational and a ranking based component. Similarly to variational autoencoders, the variational component is trained to reconstruct the original document conditioned on its generated hash code, and as in prior work, it only considers documents individually. The ranking component solves this limitation by incorporating interdocument similarity into the hash code generation, modelling document ranking through a hinge loss. To circumvent the need for labelled data to compute the hinge loss, we use a weak labeller and thus keep the approach fully unsupervised. Extensive experimental evaluation on four publicly available datasets against traditional baselines and recent stateoftheart methods for semantic hashing shows that RBSH significantly outperforms all other methods across all evaluated hash code lengths. In fact, RBSH hash codes are able to perform similarly to stateoftheart hash codes while using 24x fewer bits. 
2019  KNearest Neighbors Hashing  Xiangyu He, Peisong Wang, Jian Cheng  CVPR  Hashing based approximate nearest neighbor search embeds high dimensional data to compact binary codes, which enables efficient similarity search and storage. However, the nonisometry sign(·) function makes it hard to project the nearest neighbors in continuous data space into the closest codewords in discrete Hamming space. In this work, we revisit the sign(·) function from the perspective of space partitioning. In specific, we bridge the gap between knearest neighbors and binary hashing codes with Shannon entropy. We further propose a novel KNearest Neighbors Hashing (KNNH) method to learn binary representations from KNN within the subspaces generated by sign(·). Theoretical and experimental results show that the KNN relation is of central importance to neighbor preserving embeddings, and the proposed method outperforms the stateofthearts on benchmark datasets. 
2019  Separated Variational Hashing Networks for CrossModal Retrieval  Peng Hu, Xu Wang, Liangli Zhen, Dezhong Peng  MM  Crossmodal hashing, due to its low storage cost and high query speed, has been successfully used for similarity search in multimedia retrieval applications. It projects highdimensional data into a shared isomorphic Hamming space with similar binary codes for semanticallysimilar data. In some applications, all modalities may not be obtained or trained simultaneously for some reasons, such as privacy, secret, storage limitation, and computational resource limitation. However, most existing crossmodal hashing methods need all modalities to jointly learn the common Hamming space, thus hindering them from handling these problems. In this paper, we propose a novel approach called Separated Variational Hashing Networks (SVHNs) to overcome the above challenge. Firstly, it adopts a label network (LabNet) to exploit available and nonspecific label annotations to learn a latent common Hamming space by projecting each semantic label into a common binary representation. Then, each modalityspecific network can separately map the samples of the corresponding modality into their binary semantic codes learned by LabNet. We achieve it by conducting variational inference to match the aggregated posterior of the hashing code of LabNet with an arbitrary prior distribution. The effectiveness and efficiency of our SVHNs are verified by extensive experiments carried out on four widelyused multimedia databases, in comparison with 11 stateoftheart approaches. 
2019  Accelerate Learning of Deep Hashing With Gradient Attention  LongKai Huang, Jianda Chen, Sinno Jialin Pan  ICCV  Recent years have witnessed the success of learning to hash in fast largescale image retrieval. As deep learning has shown its superior performance on many computer vision applications, recent designs of learningbased hashing models have been moving from shallow ones to deep architectures. However, based on our analysis, we find that gradient descent based algorithms used in deep hashing models would potentially cause hash codes of a pair of training instances to be updated towards the directions of each other simultaneously during optimization. In the worst case, the paired hash codes switch their directions after update, and consequently, their corresponding distance in the Hamming space remain unchanged. This makes the overall learning process highly inefficient. To address this issue, we propose a new deep hashing model integrated with a novel gradient attention mechanism. Extensive experimental results on three benchmark datasets show that our proposed algorithm is able to accelerate the learning process and obtain competitive retrieval performance compared with stateoftheart deep hashing models. 
2019  Deep Saliency Hashing for Finegrained Retrieval  Sheng Jin, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, Lei Zhang, Xiansheng Hua  Arxiv  In recent years, hashing methods have been proved to be effective and efficient for the largescale Web media search. However, the existing general hashing methods have limited discriminative power for describing finegrained objects that share similar overall appearance but have subtle difference. To solve this problem, we for the first time introduce the attention mechanism to the learning of finegrained hashing codes. Specifically, we propose a novel deep hashing model, named deep saliency hashing (DSaH), which automatically mines salient regions and learns semanticpreserving hashing codes simultaneously. DSaH is a twostep endtoend model consisting of an attention network and a hashing network. Our loss function contains three basic components, including the semantic loss, the saliency loss, and the quantization loss. As the core of DSaH, the saliency loss guides the attention network to mine discriminative regions from pairs of images. We conduct extensive experiments on both finegrained and general retrieval datasets for performance evaluation. Experimental results on fine grained dataset, including Oxford Flowers17, Stanford Dogs120 and CUB Bird demonstrate that our DSaH performs the best for finegrained retrieval task and beats strongest competitor (DTQ) by approximately 10% on both Stanford Dogs120 and CUB Bird. DSaH is also comparable to several stateoftheart hashing methods on general datasets, including CIFAR10 and NUSWIDE. 
2019  MaximumMargin Hamming Hashing  Rong Kang, Yue Cao, Mingsheng Long (B), Jianmin Wang, and Philip S. Yu  ICCV  Deep hashing enables computation and memory efficient image search through endtoend learning of feature representations and binary codes. While linear scan over binary hash codes is more efficient than over the highdimensional representations, its lineartime complexity is still unacceptable for very large databases. Hamming space retrieval enables constanttime search through hash lookups, where for each query, there is a Hamming ball centered at the query and the data points within the ball are returned as relevant. Since inside the Hamming ball implies retrievable while outside irretrievable, it is crucial to explicitly characterize the Hamming ball. The main idea of this work is to directly embody the Hamming radius into the loss functions, leading to MaximumMargin Hamming Hashing (MMHH), a new model specifically optimized for Hamming space retrieval. We introduce a maxmargin tdistribution loss, where the tdistribution concentrates more similar data points to be within the Hamming ball, and the margin characterizes the Hamming radius such that less penalization is applied to similar data points within the Hamming ball. The loss function also introduces robustness to data noise, where the similarity supervision may be inaccurate in practical problems. The model is trained endtoend using a new semibatch optimization algorithm tailored to extremely imbalanced data. Our method yields stateoftheart results on four datasets and shows superior performance on noisy data. 
2019  Neighborhood Preserving Hashing for Scalable Video Retrieval  Shuyan Li, Zhixiang Chen, Jiwen Lu, Xiu Li, Jie Zhou  ICCV  In this paper, we propose a Neighborhood Preserving Hashing (NPH) method for scalable video retrieval in an unsupervised manner. Unlike most existing deep video hashing methods which indiscriminately compress an entire video into a binary code, we embed the spatialtemporal neighborhood information into the encoding network such that the neighborhoodrelevant visual content of a video can be preferentially encoded into a binary code under the guidance of the neighborhood information. Specifically, we propose a neighborhood attention mechanism which focuses on partial useful content of each input frame conditioned on the neighborhood information. We then integrate the neighborhood attention mechanism into an RNNbased reconstruction scheme to encourage the binary codes to capture the spatialtemporal structure in a video which is consistent with that in the neighborhood. As a consequence, the learned hashing functions can map similar videos to similar binary codes. Extensive experiments on three widelyused benchmark datasets validate the effectiveness of our proposed approach. 
2019  Push for Quantization: Deep Fisher Hashing  Yunqiang Li, Wenjie Pei, Yufei zha, Jan van Gemert  BMVC  Current massive datasets demand lightweight access for analysis. Discrete hashing methods are thus beneficial because they map highdimensional data to compact binary codes that are efficient to store and process, while preserving semantic similarity. To optimize powerful deep learning methods for image hashing, gradientbased methods are required. Binary codes, however, are discrete and thus have no continuous derivatives. Relaxing the problem by solving it in a continuous space and then quantizing the solution is not guaranteed to yield separable binary codes. The quantization needs to be included in the optimization. In this paper we push for quantization: We optimize maximum class separability in the binary space. We introduce a margin on distances between dissimilar image pairs as measured in the binary space. In addition to pairwise distances, we draw inspiration from Fisher’s Linear Discriminant Analysis (Fisher LDA) to maximize the binary distances between classes and at the same time minimize the binary distance of images within the same class. Experiments on CIFAR10, NUSWIDE and ImageNet100 demonstrate compact codes comparing favorably to the current state of the art. 
2019  MoBoost: A Selfimprovement Framework for Linearbased Hashing  Xingbo Liu, Xiushan Nie, Xiaoming Xi, Lei Zhu, Yilong Yin  CIKM  The linear model is commonly utilized in hashing methods owing to its efficiency. To obtain better accuracy, linearbased hashing methods focus on designing a generalized linear objective function with different constraints or penalty terms that consider neighborhood information. In this study, we propose a novel generalized framework called Model Boost (MoBoost), which can achieve the selfimprovement of the linearbased hashing. The proposed MoBoost is used to improve model parameter optimization for linearbased hashing methods without adding new constraints or penalty terms. In the proposed MoBoost, given a linearbased hashing method, we first execute the method several times to get several different hash codes for training samples, and then combine these different hash codes into one set utilizing one novel fusion strategy. Based on this set of hash codes, we learn some new parameters for the linear hash function that can significantly improve accuracy. The proposed MoBoost can be generally adopted in existing linearbased hashing methods, achieving more precise and stable performance compared to the original methods while imposing negligible added expenditure in terms of time and space. Extensive experiments are performed based on three benchmark datasets, and the results demonstrate the superior performance of the proposed framework. 
2019  Embarrassingly Simple Binary Representation Learning  Yuming Shen, Jie Qin,Jiaxin Chen, Li Liu, and Fan Zhu  ICCVW  Recent binary representation learning models usually require sophisticated binary optimization, similarity measure or even generative models as auxiliaries. However, one may wonder whether these nontrivial components are needed to formulate practical and effective hashing models. In this paper, we answer the above question by proposing an embarrassingly simple approach to binary representation learning. With a simple classification objective, our model only incorporates two additional fullyconnected layers onto the top of an arbitrary backbone network, whilst complying with the binary constraints during training. The proposed model lowerbounds the Information Bottleneck (IB) between data samples and their semantics, and can be related to many recent `learning to hash’ paradigms. We show that, when properly designed, even such a simple network can generate effective binary codes, by fully exploring data semantics without any heldout alternating updating steps or auxiliary models. Experiments are conducted on conventional largescale benchmarks, i.e., CIFAR10, NUSWIDE, and ImageNet, where the proposed simple model outperforms the stateoftheart methods. 
2019  VariableLength Quantization Strategy for Hashing  Yang Shi, Xiushan Nie, Xin Zhou, Xiaoming Xi, Yilong Yin  ICIP  Hashing is widely used to solve fast Approximate Nearest Neighbor (ANN) search problems, involves converting the original realvalued samples to binaryvalued representations. The conventional quantization strategies, such as SingleBit Quantization and MultiBit quantization, are considered ineffective, because of their serious information loss. To address this issue, we propose a novel variablelength quantization (VLQ) strategy for hashing. In the proposed VLQ technique, we divide all samples into different regions in each dimension firstly given the realvalued features of samples. Then we compute the dispersion degrees of these regions. Subsequently, we attempt to optimally assign different number of bits to each dimensions to obtain the minimum dispersion degree. Our experiments show that the VLQ strategy achieves not only superior performance over the stateoftheart methods, but also has a faster retrieval speed on public datasets. 
2019  Deep JointSemantics Reconstructing Hashing for LargeScale Unsupervised CrossModal Retrieval  Shupeng Su, Zhisheng Zhong, Chao Zhang  ICCV 
Crossmodal hashing encodes the multimedia data into a common binary hash space in which the correlations among the samples from different modalities can be effectively measured. Deep crossmodal hashing further improves the retrieval performance as the deep neural networks can generate more semantic relevant features and hash codes. In this paper, we study the unsupervised deep crossmodal hash coding and propose Deep Joint Semantics Reconstructing Hashing (DJSRH), which has the following two main advantages. First, to learn binary codes that preserve the neighborhood structure of the original data, DJSRH constructs a novel jointsemantics affinity matrix which elaborately integrates the original neighborhood information from different modalities and accordingly is capable to capture the latent intrinsic semantic affinity for the input multimodal instances. Second, DJSRH later trains the networks to generate binary codes that maximally reconstruct above jointsemantics relations via the proposed reconstructing framework, which is more competent for the batchwise training as it reconstructs the specific similarity value unlike the common Laplacian constraint merely preserving the similarity order. Extensive experiments demonstrate the significant improvement by DJSRH in various crossmodal retrieval tasks. 
2019  Supervised Hierarchical CrossModal Hashing  Changchang Sun, Xuemeng Song, Fuli Feng, Wayne Xin Zhao, Hao Zhang and Liqiang Nie  SIGIR  Recently, due to the unprecedented growth of multimedia data, crossmodal hashing has gained increasing attention for the efficient crossmedia retrieval. Typically, existing methods on crossmodal hashing treat labels of one instance independently but overlook the correlations among labels. Indeed, in many realworld scenarios, like the online fashion domain, instances (items) are labeled with a set of categories correlated by certain hierarchy. In this paper, we propose a new endtoend solution for supervised crossmodal hashing, named HiCHNet, which explicitly exploits the hierarchical labels of instances. In particular, by the preestablished label hierarchy, we comprehensively characterize each modality of the instance with a set of layerwise hash representations. In essence, hash codes are encouraged to not only preserve the layerwise semantic similarities encoded by the label hierarchy, but also retain the hierarchical discriminative capabilities. Due to the lack of benchmark datasets, apart from adapting the existing dataset FashionVC from fashion domain, we create a dataset from the online fashion platform Ssense consisting of 15, 696 imagetext pairs labeled by 32 hierarchical categories. Extensive experiments on two realworld datasets demonstrate the superiority of our model over the stateoftheart methods. 
2019  Deep Collaborative Discrete Hashing with SemanticInvariant Structure  Zijian Wang, Zheng Zhang, Yadan Luo and Zi Huang  SIGIR  Existing deep hashing approaches fail to fully explore semantic correlations and neglect the effect of linguistic context on visual attention learning, leading to inferior performance. This paper proposes a dualstream learning framework, dubbed Deep Collaborative Discrete Hashing (DCDH), which constructs a discriminative common discrete space by collaboratively incorporating the shared and individual semantics deduced from visual features and semantic labels. Specifically, the contextaware representations are generated by employing the outer product of visual embeddings and semantic encodings. Moreover, we reconstruct the labels and introduce the focal loss to take advantage of frequent and rare concepts. The common binary code space is built on the joint learning of the visual representations attended by language, the semanticinvariant structure construction and the label distribution correction. Extensive experiments demonstrate the superiority of our method. 
2019  Semisupervised Deep Quantization for Crossmodal Search  Xin Wang, Wenwu Zhu, Chenghao Liu  MM  The problem of crossmodal similarity search, which aims at making efficient and accurate queries across multiple domains, has become a significant and important research topic. Composite quantization, a compact coding solution superior to hashing techniques, has shown its effectiveness for similarity search. However, most existing works utilizing composite quantization to search multidomain content only consider either pairwise similarity information or class label information across different domains, which fails to tackle the semisupervised problem in composite quantization. In this paper, we address the semisupervised quantization problem by considering: (i) pairwise similarity information (without class label information) across different domains, which captures the intradocument relation, (ii) crossdomain data with class label which can help capture interdocument relation, and (iii) crossdomain data with neither pairwise similarity nor class label which enables the full use of abundant unlabelled information. To the best of our knowledge, we are the first to consider both supervised information (pairwise similarity + class label) and unsupervised information (neither pairwise similarity nor class label) simultaneously in composite quantization. A challenging problem arises: how can we jointly handle these three sorts of information across multiple domains in an efficient way? To tackle this challenge, we propose a novel semisupervised deep quantization (SSDQ) model that takes both supervised and unsupervised information into account. The proposed SSDQ model is capable of incorporating the above three kinds of information into one single framework when utilizing composite quantization for accurate and efficient queries across different domains. More specifically, we employ a modified deep autoencoder for better latent representation and formulate pairwise similarity loss, supervised quantization loss as well as unsupervised distribution match loss to handle all three types of information. The extensive experiments demonstrate the significant improvement of SSDQ over several stateoftheart methods on various datasets. 
2019  Deep Incremental Hashing Network for Efficient Image Retrieval  Dayan Wu, Qi Dai, Jing Liu, Bo Li, Weiping Wang  CVPR  Hashing has shown great potential in largescale image retrieval due to its storage and computation efficiency, especially the recent deep supervised hashing methods. To achieve promising performance, deep supervised hashing methods require a large amount of training data from different classes. However, when images of new categories emerge, existing deep hashing methods have to retrain the CNN model and generate hash codes for all the database images again, which is impractical for largescale retrieval system. In this paper, we propose a novel deep hashing framework, called Deep Incremental Hashing Network (DIHN), for learning hash codes in an incremental manner. DIHN learns the hash codes for the new coming images directly, while keeping the old ones unchanged. Simultaneously, a deep hash function for query set is learned by preserving the similarities between training points. Extensive experiments on two widely used image retrieval benchmarks demonstrate that the proposed DIHN framework can significantly decrease the training time while keeping the stateoftheart retrieval accuracy. 
2019  Online Multimodal Hashing with Dynamic Queryadaption  Xu Lu, Lei Zhu, Zhiyong Cheng, Liqiang Nie and Huaxiang Zhang  SIGIR  Multimodal hashing is an effective technique to support largescale multimedia retrieval, due to its capability of encoding heterogeneous multimodal features into compact and similaritypreserving binary codes. Although great progress has been achieved so far, existing methods still suffer from several problems, including: 1) All existing methods simply adopt fixed modality combination weights in online hashing process to generate the query hash codes. This strategy cannot adaptively capture the variations of different queries. 2) They either suffer from insufficient semantics (for unsupervised methods) or require high computation and storage cost (for the supervised methods, which rely on pairwise semantic matrix). 3) They solve the hash codes with relaxed optimization strategy or bitbybit discrete optimization, which results in significant quantization loss or consumes considerable computation time. To address the above limitations, in this paper, we propose an Online Multimodal Hashing with Dynamic Queryadaption (OMHDQ) method in a novel fashion. Specifically, a selfweighted fusion strategy is designed to adaptively preserve the multimodal feature information into hash codes by exploiting their complementarity. The hash codes are learned with the supervision of pairwise semantic labels to enhance their discriminative capability, while avoiding the challenging symmetric similarity matrix factorization. Under such learning framework, the binary hash codes can be directly obtained with efficient operations and without quantization errors. Accordingly, our method can benefit from the semantic labels, and simultaneously, avoid the high computation complexity. Moreover, to accurately capture the query variations, at the online retrieval stage, we design a parameterfree online hashing module which can adaptively learn the query hash codes according to the dynamic query contents. Extensive experiments demonstrate the stateoftheart performance of the proposed approach from various aspects. 
2019  Deep Hashing by Discriminating Hard Examples  Cheng Yan, Guansong Pang, Xiao Bai, Chunhua Shen, Jun Zhou, Edwin Hancock  MM  This paper tackles a rarely explored but critical problem within learning to hash, i.e., to learn hash codes that effectively discriminate hard similar and dissimilar examples, to empower largescale image retrieval. Hard similar examples refer to image pairs from the same semantic class that demonstrate some shared appearance but have different finegrained appearance. Hard dissimilar examples are image pairs that come from different semantic classes but exhibit similar appearance. These hard examples generally have a small distance due to the shared appearance. Therefore, effective encoding of the hard examples can well discriminate the relevant images within a small Hamming distance, enabling more accurate retrieval in the topranked returned images. However, most existing hashing methods cannot capture this key information as their optimization is dominated byeasy examples, i.e., distant similar/dissimilar pairs that share no or limited appearance. To address this problem, we introduce a novel Gamma distributionenabled and symmetric KullbackLeibler divergencebased loss, which is dubbed dual hinge loss because it works similarly as imposing two smoothed hinge losses on the respective similar and dissimilar pairs. Specifically, the loss enforces exponentially variant penalization on the hard similar (dissimilar) examples to emphasize and learn their finegrained difference. It meanwhile imposes a bounding penalization on easy similar (dissimilar) examples to prevent the dominance of the easy examples in the optimization while preserving the highlevel similarity (dissimilarity). This enables our model to well encode the key information carried by both easy and hard examples. Extensive empirical results on three widelyused image retrieval datasets show that (i) our method consistently and substantially outperforms stateoftheart competing methods using hash codes of the same length and (ii) our method can use significantly (e.g., 50%75%) shorter hash codes to perform substantially better than, or comparably well to, the competing methods. 
2019  DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs  Erkun Yang, Tongliang Liu, Cheng Deng, Wei Liu, Dacheng Tao  CVPR  Due to the high storage and search efficiency, hashing has become prevalent for largescale similarity search. Particularly, deep hashing methods have greatly improved the search performance under supervised scenarios. In contrast, unsupervised deep hashing models can hardly achieve satisfactory performance due to the lack of reliable supervisory similarity signals. To address this issue, we propose a novel deep unsupervised hashing model, dubbed DistillHash, which can learn a distilled data set consisted of data pairs, which have confidence similarity signals. Specifically, we investigate the relationship between the initial noisy similarity signals learned from local structures and the semantic similarity labels assigned by a Bayes optimal classifier. We show that under a mild assumption, some data pairs, of which labels are consistent with those assigned by the Bayes optimal classifier, can be potentially distilled. Inspired by this fact, we design a simple yet effective strategy to distill data pairs automatically and further adopt a Bayesian learning framework to learn hash functions from the distilled data set. Extensive experimental results on three widely used benchmark datasets show that the proposed DistillHash consistently accomplishes the stateoftheart search performance. 
2018  Hashing with Binary Matrix Pursuit  F. Cakir, K. He, S. Sclaroff  ECCV  We propose theoretical and empirical improvements for twostage hashing methods. We first provide a theoretical analysis on the quality of the binary codes and show that, under mild assumptions, a residual learning scheme can construct binary codes that fit any neighborhood structure with arbitrary accuracy. Secondly, we show that with highcapacity hash functions such as CNNs, binary code inference can be greatly simplified for many standard neighborhood definitions, yielding smaller optimization problems and more robust codes. Incorporating our findings, we propose a novel twostage hashing method that significantly outperforms previous hashing studies on widely used image retrieval benchmarks. 
2018  Deep Cauchy Hashing for Hamming Space Retrieval  Yue Cao, Mingsheng Long, Bin Liu, Jianmin Wang  CVPR  Due to its computation efficiency and retrieval quality, hashing has been widely applied to approximate nearest neighbor search for largescale image retrieval, while deep hashing further improves the retrieval quality by endtoend representation learning and hash coding. With compact hash codes, Hamming space retrieval enables the most efficient constanttime search that returns data points within a given Hamming radius to each query, by hash table lookups instead of linear scan. However, subject to the weak capability of concentrating relevant images to be within a small Hamming ball due to misspecified loss functions, existing deep hashing methods may underperform for Hamming space retrieval. This work presents Deep Cauchy Hashing (DCH), a novel deep hashing model that generates compact and concentrated binary hash codes to enable efficient and effective Hamming space retrieval. The main idea is to design a pairwise crossentropy loss based on Cauchy distribution, which penalizes significantly on similar image pairs with Hamming distance larger than the given Hamming radius threshold. Comprehensive experiments demonstrate that DCH can generate highly concentrated hash codes and yield stateoftheart Hamming space retrieval performance on three datasets, NUSWIDE, CIFAR10, and MSCOCO. 
2018  HashGAN: Deep Learning to Hash with Pair Conditional Wasserstein GAN  Yue Cao, Mingsheng Long, Bin Liu, Jiamin Wang  CVPR  Deep learning to hash improves image retrieval performance by endtoend representation learning and hash coding from training data with pairwise similarity information. Subject to the scarcity of similarity information that is often expensive to collect for many application domains, existing deep learning to hash methods may overfit the training data and result in substantial loss of retrieval quality. This paper presents HashGAN, a novel architecture for deep learning to hash, which learns compact binary hash codes from both real images and diverse images synthesized by generative models. The main idea is to augment the training data with nearly real images synthesized from a new Pair Conditional Wasserstein GAN (PCWGAN) conditioned on the pairwise similarity information. Extensive experiments demonstrate that HashGAN can generate highquality binary hash codes and yield stateoftheart image retrieval performance on three benchmarks, NUSWIDE, CIFAR10, and MSCOCO. 
2018  Hashing as TieAware Learning to Rank  K. He, F. Cakir, S. Bargal, S. Sclaroff  CVPR  Hashing, or learning binary embeddings of data, is frequently used in nearest neighbor retrieval. In this paper, we develop learning to rank formulations for hashing, aimed at directly optimizing rankingbased evaluation metrics such as Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG). We first observe that the integervalued Hamming distance often leads to tied rankings, and propose to use tieaware versions of AP and NDCG to evaluate hashing for retrieval. Then, to optimize tieaware ranking metrics, we derive their continuous relaxations, and perform gradientbased optimization with deep neural networks. Our results establish the new stateoftheart for image retrieval by Hamming ranking in common benchmarks. 
2018  SCRATCH: A Scalable Discrete Matrix Factorization Hashing for CrossModal Retrieval  ChuanXiang Li , ZhenDuo Chen, PengFei Zhang, Xin Luo, Liqiang Nie, Wei Zhang, XinShun Xu  MM  In recent years, many hashing methods have been proposed for the crossmodal retrieval task. However, there are still some issues that need to be further explored. For example, some of them relax the binary constraints to generate the hash codes, which may generate large quantization error. Although some discrete schemes have been proposed, most of them are timeconsuming. In addition, most of the existing supervised hashing methods use an n x n similarity matrix during the optimization, making them unscalable. To address these issues, in this paper, we present a novel supervised crossmodal hashing method—Scalable disCRete mATrix faCtorization Hashing, SCRATCH for short. It leverages the collective matrix factorization on the kernelized features and the semantic embedding with labels to find a latent semantic space to preserve the intra and intermodality similarities. In addition, it incorporates the label matrix instead of the similarity matrix into the loss function. Based on the proposed loss function and the iterative optimization algorithm, it can learn the hash functions and binary codes simultaneously. Moreover, the binary codes can be generated discretely, reducing the quantization error generated by the relaxation scheme. Its time complexity is linear to the size of the dataset, making it scalable to largescale datasets. Extensive experiments on three benchmark datasets, namely, Wiki, MIRFlickr25K, and NUSWIDE, have verified that our proposed SCRATCH model outperforms several stateoftheart unsupervised and supervised hashing methods for crossmodal retrieval. 
2018  Deep Domain Adaptation Hashing with Adversarial Learning  Fuchen Long, Ting Yao, Qi Dai, Xinmei Tian, Jiebo Luo, Tao Mei  SIGIR  The recent advances in deep neural networks have demonstrated high capability in a wide variety of scenarios. Nevertheless, finetuning deep models in a new domain still requires a significant amount of labeled data despite expensive labeling efforts. A valid question is how to leverage the source knowledge plus unlabeled or only sparsely labeled target data for learning a new model in target domain. The core problem is to bring the source and target distributions closer in the feature space. In the paper, we facilitate this issue in an adversarial learning framework, in which a domain discriminator is devised to handle domain shift. Particularly, we explore the learning in the context of hashing problem, which has been studied extensively due to its great efficiency in gigantic data. Specifically, a novel Deep Domain Adaptation Hashing with Adversarial learning (DeDAHA) architecture is presented, which mainly consists of three components: a deep convolutional neural networks (CNN) for learning basic image/frame representation followed by an adversary stream on one hand to optimize the domain discriminator, and on the other, to interact with each domainspecific hashing stream for encoding image representation to hash codes. The whole architecture is trained endtoend by jointly optimizing two types of losses, i.e., triplet ranking loss to preserve the relative similarity ordering in the input triplets and adversarial loss to maximally fool the domain discriminator with the learnt source and target feature distributions. Extensive experiments are conducted on three domain transfer tasks, including crossdomain digits retrieval, image to image and image to video transfers, on several benchmarks. Our DeDAHA framework achieves superior results when compared to the stateoftheart techniques. 
2018  Fast Scalable Supervised Hashing  Xin Luo, Liqiang Nie, Xiangnan He, Ye Wu, ZhenDuo Chen, XinShun Xu  SIGIR  Despite significant progress in supervised hashing, there are three common limitations of existing methods. First, most pioneer methods discretely learn hash codes bit by bit, making the learning procedure rather timeconsuming. Second, to reduce the large complexity of the n by n pairwise similarity matrix, most methods apply sampling strategies during training, which inevitably results in information loss and suboptimal performance; some recent methods try to replace the large matrix with a smaller one, but the size is still large. Third, among the methods that leverage the pairwise similarity matrix, most of them only encode the semantic label information in learning the hash codes, failing to fully capture the characteristics of data. In this paper, we present a novel supervised hashing method, called Fast Scalable Supervised Hashing (FSSH), which circumvents the use of the large similarity matrix by introducing a precomputed intermediate term whose size is independent with the size of training data. Moreover, FSSH can learn the hash codes with not only the semantic information but also the features of data. Extensive experiments on three widely used datasets demonstrate its superiority over several stateoftheart methods in both accuracy and scalability. Our experiment codes are available at: https://lcbwlx.wixsite.com/fssh. 
2018  Progressive Generative Hashing for Image Retrieval  Yuqing Ma, Yue He, Fan Ding, Sheng Hu, Jun Li, Xianglong Liu  IJCAI  Recent years have witnessed the success of the emerging hashing techniques in largescale image retrieval. Owing to the great learning capacity, deep hashing has become one of the most promising solutions, and achieved attractive performance in practice. However, without semantic label information, the unsupervised deep hashing still remains an open question. In this paper, we propose a novel progressive generative hashing (PGH) framework to help learn a discriminative hashing network in an unsupervised way. Different from existing studies, it first treats the hash codes as a kind of semantic condition for the similar image generation, and simultaneously feeds the original image and its codes into the generative adversarial networks (GANs). The real images together with the synthetic ones can further help train a discriminative hashing network based on a triplet loss. By iteratively inputting the learnt codes into the hash conditioned GANs, we can progressively enable the hashing network to discover the semantic relations. Extensive experiments on the widelyused image datasets demonstrate that PGH can significantly outperform stateoftheart unsupervised hashing methods. 
2018  LocalitySensitive Hashing for Earthquake Detection: A Case Study of Scaling DataDriven Science  Kexin Rong, Clara E. Yoon, Karianne J. Bergen, Hashem Elezabi,Peter Bailis, Philip Levis, Gregory C. Beroza  VLDB  In this work, we report on a novel application of Locality Sensitive Hashing (LSH) to seismic data at scale. Based on the high waveform similarity between reoccurring earthquakes, our application identifies potential earthquakes by searching for similar time series segments via LSH. However, a straightforward implementation of this LSHenabled application has difficulty scaling beyond 3 months of continuous time series data measured at a single seismic station. As a case study of a datadriven science workflow, we illustrate how domain knowledge can be incorporated into the workload to improve both the efficiency and result quality. We describe several endtoend optimizations of the analysis pipeline from preprocessing to postprocessing, which allow the application to scale to time series data measured at multiple seismic stations. Our optimizations enable an over 100× speedup in the endtoend analysis pipeline. This improved scalability enabled seismologists to perform seismic analysis on more than ten years of continuous time series data from over ten seismic stations, and has directly enabled the discovery of 597 new earthquakes near the Diablo Canyon nuclear power plant in California and 6123 new earthquakes in New Zealand. 
2018  NASH: Toward EndtoEnd Neural Architecture for Generative Semantic Hashing  Dinghan Shen, Qinliang Su, Paidamoyo Chapfuwa, Wenlin Wang, Guoyin Wang, Lawrence Carin, Ricardo Henao  ACL  Semantic hashing has become a powerful paradigm for fast similarity search in many information retrieval systems. While fairly successful, previous techniques generally require twostage training, and the binary constraints are handled adhoc. In this paper, we present an endtoend Neural Architecture for Semantic Hashing (NASH), where the binary hashing codes are treated as Bernoulli latent variables. A neural variational inference framework is proposed for training, where gradients are directly backpropagated through the discrete latent variable to optimize the hash function. We also draw connections between proposed method and ratedistortion theory, which provides a theoretical foundation for the effectiveness of the proposed framework. Experimental results on three public datasets demonstrate that our method significantly outperforms several stateoftheart models on both unsupervised and supervised scenarios. 
2018  Unsupervised Deep Hashing with SimilarityAdaptive and Discrete Optimization  Fumin Shen, Yan Xu, Li Liu, Yang Yang, Zi Huang, Heng Tao Shen  TPAMI  Recent vision and learning studies show that learning compact hash codes can facilitate massive data processing with significantly reduced storage and computation. Particularly, learning deep hash functions has greatly improved the retrieval performance, typically under the semantic supervision. In contrast, current unsupervised deep hashing algorithms can hardly achieve satisfactory performance due to either the relaxed optimization or absence of similaritysensitive objective. In this work, we propose a simple yet effective unsupervised hashing framework, named SimilarityAdaptive Deep Hashing (SADH), which alternatingly proceeds over three training modules: deep hash model training, similarity graph updating and binary code optimization. The key difference from the widelyused twostep hashing method is that the output representations of the learned deep model help update the similarity graph matrix, which is then used to improve the subsequent code optimization. In addition, for producing highquality binary codes, we devise an effective discrete optimization algorithm which can directly handle the binary constraints with a general hashing loss. Extensive experiments validate the efficacy of SADH, which consistently outperforms the stateofthearts by large gaps. 
2018  SelfSupervised Video Hashing with Hierarchical Binary Autoencoder  Jingkuan Song, Hanwang Zhang, Xiangpeng Li, Lianli Gao, Meng Wang, Richang Hong  TIP  Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework dubbed SelfSupervised Video Hashing (SSVH), that is able to capture the temporal nature of videos in an endtoend learningtohash fashion. We specifically address two central problems: 1) how to design an encoderdecoder architecture to generate binary codes for videos; and 2) how to equip the binary codes with the ability of accurate video retrieval. We design a hierarchical binary autoencoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked architecture. Then, we encourage the binary codes to simultaneously reconstruct the visual content and neighborhood structure of the videos. Experiments on two realworld datasets (FCVID and YFCC) show that our SSVH method can significantly outperform the stateoftheart methods and achieve the currently best performance on the task of unsupervised video retrieval. 
2018  Greedy Hash: Towards Fast Optimization for Accurate Hash Coding in CNN  Shupeng Su, Chao Zhang, Kai Han, Yonghong Tian  NIPS  To convert the input into binary code, hashing algorithm has been widely used for approximate nearest neighbor search on largescale image sets due to its computation and storage efficiency. Deep hashing further improves the retrieval quality by combining the hash coding with deep neural network. However, a major difficulty in deep hashing lies in the discrete constraints imposed on the network output, which generally makes the optimization NP hard. In this work, we adopt the greedy principle to tackle this NP hard problem by iteratively updating the network toward the probable optimal discrete solution in each iteration. A hash coding layer is designed to implement our approach which strictly uses the sign function in forward propagation to maintain the discrete constraints, while in back propagation the gradients are transmitted intactly to the front layer to avoid the vanishing gradients. In addition to the theoretical derivation, we provide a new perspective to visualize and understand the effectiveness and efficiency of our algorithm. Experiments on benchmark datasets show that our scheme outperforms stateoftheart hashing methods in both supervised and unsupervised tasks. 
2017  MIHash: Online Hashing with Mutual Information  F. Cakir, K. He, S. Bargal, S. Sclaroff  ICCV  Learningbased hashing methods are widely used for nearest neighbor retrieval, and recently, online hashing methods have demonstrated good performancecomplexity tradeoffs by learning hash functions from streaming data. In this paper, we first address a key challenge for online hashing: the binary codes for indexed data must be recomputed to keep pace with updates to the hash functions. We propose an efficient quality measure for hash functions, based on an informationtheoretic quantity, mutual information, and use it successfully as a criterion to eliminate unnecessary hash table updates. Next, we also show how to optimize the mutual information objective using stochastic gradient descent. We thus develop a novel hashing method, MIHash, that can be used in both online and batch settings. Experiments on image retrieval benchmarks (including a 2.5M image dataset) confirm the effectiveness of our formulation, both in reducing hash table recomputations and in learning highquality hash functions. 
2017  Collective Deep Quantization for Efficient CrossModal Retrieval  Yue Cao, Mingsheng Long, Jianmin Wang, Shichen Liu  AAAI  Crossmodal similarity retrieval is a problem about designing a retrieval system that supports querying across content modalities, e.g., using an image to retrieve for texts. This paper presents a compact coding solution for efficient crossmodal retrieval, with a focus on the quantization approach which has already shown the superior performance over the hashing solutions in singlemodal similarity retrieval. We propose a collective deep quantization (CDQ) approach, which is the first attempt to introduce quantization in endtoend deep architecture for crossmodal retrieval. The major contribution lies in jointly learning deep representations and the quantizers for both modalities using carefullycrafted hybrid networks and wellspecified loss functions. In addition, our approach simultaneously learns the common quantizer codebook for both modalities through which the crossmodal correlation can be substantially enhanced. CDQ enables efficient and effective crossmodal retrieval using inner product distance computed based on the common codebook with fast distance table lookup. Extensive experiments show that CDQ yields state of the art crossmodal retrieval results on standard benchmarks. 
2017  Correlation Autoencoder Hashing for Supervised CrossModal Search  Yue Cao, Mingsheng Long, Jianmin Wang, Han Zhu  BMVC  Hashing is widely applied to approximate nearest neighbor search for largescale multimodal retrieval with storage and computation efficiency. Crossmodal hashing improves the quality of hash coding by exploiting semantic correlations across different modalities. Existing crossmodal hashing methods first transform data into lowdimensional feature vectors, and then generate binary codes by another separate quantization step. However, suboptimal hash codes may be generated since the quantization error is not explicitly minimized and the feature representation is not jointly optimized with the binary codes. This paper presents a Correlation Hashing Network (CHN) approach to crossmodal hashing, which jointly learns good data representation tailored to hash coding and formally controls the quantization error. The proposed CHN is a hybrid deep architecture that constitutes a convolutional neural network for learning good image representations, a multilayer perception for learning good text representations, two hashing layers for generating compact binary codes, and a structured maxmargin loss that integrates all things together to enable learning similaritypreserving and highquality hash codes. Extensive empirical study shows that CHN yields state of the art crossmodal retrieval performance on standard benchmarks. 
2017  HashNet: Deep Learning to Hash by Continuation  Zhangjie Cao, Mingsheng Long, Jianmin Wang, Philip S. Yu  CVPR  Learning to hash has been widely applied to approximate nearest neighbor search for largescale multimedia retrieval, due to its computation efficiency and retrieval quality. Deep learning to hash, which improves retrieval quality by endtoend representation learning and hash encoding, has received increasing attention recently. Subject to the illposed gradient difficulty in the optimization with sign activations, existing deep learning to hash methods need to first learn continuous representations and then generate binary hash codes in a separated binarization step, which suffer from substantial loss of retrieval quality. This work presents HashNet, a novel deep architecture for deep learning to hash by continuation method with convergence guarantees, which learns exactly binary hash codes from imbalanced similarity data. The key idea is to attack the illposed gradient problem in optimizing deep networks with nonsmooth binary activations by continuation method, in which we begin from learning an easier network with smoothed activation function and let it evolve during the training, until it eventually goes back to being the original, difficult to optimize, deep network with the sign activation function. Comprehensive empirical evidence shows that HashNet can generate exactly binary hash codes and yield stateoftheart multimedia retrieval performance on standard benchmarks. 
2017  Variational Deep Semantic Hashing for Text Documents  Suthee Chaidaroon, Yi Fang  SIGIR  As the amount of textual data has been rapidly increasing over the past decade, efficient similarity search methods have become a crucial component of largescale information retrieval systems. A popular strategy is to represent original data samples by compact binary codes through hashing. A spectrum of machine learning methods have been utilized, but they often lack expressiveness and flexibility in modeling to learn effective representations. The recent advances of deep learning in a wide range of applications has demonstrated its capability to learn robust and powerful feature representations for complex data. Especially, deep generative models naturally combine the expressiveness of probabilistic generative models with the high capacity of deep neural networks, which is very suitable for text modeling. However, little work has leveraged the recent progress in deep learning for text hashing. In this paper, we propose a series of novel deep document generative models for text hashing. The first proposed model is unsupervised while the second one is supervised by utilizing document labels/tags for hashing. The third model further considers documentspecific factors that affect the generation of words. The probabilistic generative formulation of the proposed models provides a principled framework for model extension, uncertainty estimation, simulation, and interpretability. Based on variational inference and reparameterization, the proposed models can be interpreted as encoderdecoder deep neural networks and thus they are capable of learning complex nonlinear distributed representations of the original documents. We conduct a comprehensive set of experiments on four public testbeds. The experimental results have demonstrated the effectiveness of the proposed supervised learning models for text hashing. 
2017  Deep CrossModal Hashing  QingYuan Jiang, WuJun Li  CVPR  Due to its low storage cost and fast query speed, crossmodal hashing (CMH) has been widely used for similarity search in multimedia retrieval applications. However, most existing CMH methods are based on handcrafted features which might not be optimally compatible with the hashcode learning procedure. As a result, existing CMH methods with handcrafted features may not achieve satisfactory performance. In this paper, we propose a novel CMH method, called deep crossmodal hashing (DCMH), by integrating feature learning and hashcode learning into the same framework. DCMH is an endtoend learning framework with deep neural networks, one for each modality, to perform feature learning from scratch. Experiments on three real datasets with imagetext modalities show that DCMH can outperform other baselines to achieve the stateoftheart performance in crossmodal retrieval applications. 
2017  Deep Supervised Discrete Hashing  Qi Li, Zhenan Sun, Ran He, Tieniu Tan  NIPS  With the rapid growth of image and video data on the web, hashing has been extensively studied for image or video search in recent years. Benefiting from recent advances in deep learning, deep hashing methods have achieved promising results for image retrieval. However, there are some limitations of previous deep hashing methods (e.g., the semantic information is not fully exploited). In this paper, we develop a deep supervised discrete hashing algorithm based on the assumption that the learned binary codes should be ideal for classification. Both the pairwise label information and the classification information are used to learn the hash codes within one stream framework. We constrain the outputs of the last layer to be binary codes directly, which is rarely investigated in deep hashing algorithm. Because of the discrete nature of hash codes, an alternating minimization method is used to optimize the objective function. Experimental results have shown that our method outperforms current stateoftheart methods on benchmark datasets. 
2017  Deep Semantic Hashing with Generative Adversarial Networks  Zhaofan Qiu, Yingwei Pan, Ting Yao, Tao Mei  SIGIR  Hashing has been a widelyadopted technique for nearest neighbor search in largescale image retrieval tasks. Recent research has shown that leveraging supervised information can lead to high quality hashing. However, the cost of annotating data is often an obstacle when applying supervised hashing to a new domain. Moreover, the results can suffer from the robustness problem as the data at training and test stage may come from different distributions. This paper studies the exploration of generating synthetic data through semisupervised generative adversarial networks (GANs), which leverages largely unlabeled and limited labeled training data to produce highly compelling data with intrinsic invariance and global coherence, for better understanding statistical structures of natural data. We demonstrate that the above two limitations can be well mitigated by applying the synthetic data for hashing. Specifically, a novel deep semantic hashing with GANs (DSHGANs) is presented, which mainly consists of four components: a deep convolution neural networks (CNN) for learning image representations, an adversary stream to distinguish synthetic images from real ones, a hash stream for encoding image representations to hash codes and a classification stream. The whole architecture is trained endtoend by jointly optimizing three losses, i.e., adversarial loss to correct label of synthetic or real for each sample, triplet ranking loss to preserve the relative similarity ordering in the input realsynthetic triplets and classification loss to classify each sample accurately. Extensive experiments conducted on both CIFAR10 and NUSWIDE image benchmarks validate the capability of exploiting synthetic images for hashing. Our framework also achieves superior results when compared to stateoftheart deep hash models. 
2017  A Survey on Learning to Hash  Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, and Heng Tao Shen  TPAMI  Nearest neighbor search is a problem of finding the data points from the database such that the distances from them to the query point are the smallest. Learning to hash is one of the major solutions to this problem and has been widely studied recently. In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations. We separate quantization from pairwise similarity preserving as the objective function is very different though quantization, as we show, can be derived from preserving the pairwise similarities. In addition, we present the evaluation protocols, and the general performance analysis, and point out that the quantization algori 
2017  Deep Supervised Hashing for MultiLabel and LargeScale Image Retrieval  Dayan Wu, Zheng Lin, Bo Li, Mingzhen Ye, Weiping Wang  ICMR  One of the most challenging tasks in largescale multilabel image retrieval is to map images into binary codes while preserving multilevel semantic similarity. Recently, several deep supervised hashing methods have been proposed to learn hash functions that preserve multilevel semantic similarity with deep convolutional neural networks. However, these triplet label based methods try to preserve the ranking order of images according to their similarity degrees to the queries while not putting direct constraints on the distance between the codes of very similar images. Besides, the current evaluation criteria are not able to measure the performance of existing hashing methods on preserving finegrained multilevel semantic similarity. To tackle these issues, we propose a novel Deep Multilevel Semantic Similarity Preserving Hashing (DMSSPH) method to learn compact similaritypreserving binary codes for the huge body of multilabel image data with deep convolutional neural networks. In our approach, we make the best of the supervised information in the form of pairwise labels to maximize the discriminability of output binary codes. Extensive evaluations conducted on several benchmark datasets demonstrate that the proposed method significantly outperforms the stateoftheart supervised and unsupervised hashing methods at the accuracies of top returned images, especially for shorter binary codes. Meanwhile, the proposed method shows better performance on preserving finegrained multilevel semantic similarity according to the results under the Jaccard coefficient based evaluation criteria we propose. 
2016  Correlation Autoencoder Hashing for Supervised CrossModal Search  Yue Cao, Mingsheng Long, Jianmin Wang, Han Zhu  ICMR  Due to its storage and query efficiency, hashing has been widely applied to approximate nearest neighbor search from largescale datasets. While there is increasing interest in crossmodal hashing which facilitates crossmedia retrieval by embedding data from different modalities into a common Hamming space, how to distill the crossmodal correlation structure effectively remains a challenging problem. In this paper, we propose a novel supervised crossmodal hashing method, Correlation Autoencoder Hashing (CAH), to learn discriminative and compact binary codes based on deep autoencoders. Specifically, CAH jointly maximizes the feature correlation revealed by bimodal data and the semantic correlation conveyed in similarity labels, while embeds them into hash codes by nonlinear deep autoencoders. Extensive experiments clearly show the superior effectiveness and efficiency of CAH against the stateoftheart hashing methods on standard crossmodal retrieval benchmarks. 
2016  Deep VisualSemantic Hashing for CrossModal Retrieval  Yue Cao, Mingsheng Long, Jianmin Wang, Qiang Yang, Philip S. Yu  KDD  Due to the storage and retrieval efficiency, hashing has been widely applied to approximate nearest neighbor search for largescale multimedia retrieval. Crossmodal hashing, which enables efficient retrieval of images in response to text queries or vice versa, has received increasing attention recently. Most existing work on crossmodal hashing does not capture the spatial dependency of images and temporal dynamics of text sentences for learning powerful feature representations and crossmodal embeddings that mitigate the heterogeneity of different modalities. This paper presents a new Deep Visual Semantic Hashing (DVSH) model that generates compact hash codes of images and sentences in an endtoend deep learning architecture, which capture the intrinsic crossmodal correspondences between visual data and natural language. DVSH is a hybrid deep architecture that constitutes a visual semantic fusion network for learning joint embedding space of images and text sentences, and two modalityspecific hashing networks for learning hash functions to generate compact binary codes. Our architecture effectively unifies joint multimodal embedding and crossmodal hashing, which is based on a novel combination of Convolutional Neural Networks over images, Recurrent Neural Networks over sentences, and a structured maxmargin objective that integrates all things together to enable learning of similaritypreserving and highquality hash codes. Extensive empirical evidence shows that our DVSH approach yields state of the art results in crossmodal retrieval experiments on imagesentences datasets, i.e. standard IAPR TC12 and largescale Microsoft COCO. 
2016  Column Sampling Based Discrete Supervised Hashing  WangCheng Kang, WuJun Li, ZhiHua Zhou  AAAI  By leveraging semantic (label) information, supervised hashing has demonstrated better accuracy than unsupervised hashing in many real applications. Because the hashingcode learning problem is essentially a discrete optimization problem which is hard to solve, most existing supervised hashing methods try to solve a relaxed continuous optimization problem by dropping the discrete constraints. However, these methods typically suffer from poor performance due to the errors caused by the relaxation. Some other methods try to directly solve the discrete optimization problem. However, they are typically timeconsuming and unscalable. In this paper, we propose a novel method, called column sampling based discrete supervised hashing (COSDISH), to directly learn the discrete hashing code from semantic information. COSDISH is an iterative method, in each iteration of which several columns are sampled from the semantic similarity matrix and then the hashing code is decomposed into two parts which can be alternately optimized in a discrete way. Theoretical analysis shows that the learning (optimization) algorithm of COSDISH has a constantapproximation bound in each step of the alternating optimization procedure. Empirical results on datasets with semantic labels illustrate that COSDISH can outperform the stateoftheart methods in real applications like image retrieval. 
2016  Feature Learning based Deep Supervised Hashing with Pairwise Labels  WuJun Li, Sheng Wang and WangCheng Kang  IJCAI  Recent years have witnessed wide application of hashing for largescale image retrieval. However, most existing hashing methods are based on handcrafted features which might not be optimally compatible with the hashing procedure. Recently, deep hashing methods have been proposed to perform simultaneous feature learning and hashcode learning with deep neural networks, which have shown better performance than traditional hashing methods with handcrafted features. Most of these deep hashing methods are supervised whose supervised information is given with triplet labels. For another common application scenario with pairwise labels, there have not existed methods for simultaneous feature learning and hashcode learning. In this paper, we propose a novel deep hashing method, called deep pairwisesupervised hashing (DPSH), to perform simultaneous feature learning and hashcode learning for applications with pairwise labels. Experiments on real datasets show that our DPSH method can outperform other methods to achieve the stateoftheart performance in image retrieval applications. 
2016  Enhancing First Story Detection using Word Embeddings  S. Moran, R. McCreadie, C. Macdonald, I. Ounis  SIGIR  In this paper we show how word embeddings can be used to increase the effectiveness of a stateofthe art Locality Sensitive Hashing (LSH) based first story detection (FSD) system over a standard tweet corpus. Vocabulary mismatch, in which related tweets use different words, is a serious hindrance to the effectiveness of a modern FSD system. In this case, a tweet could be flagged as a first story even if a related tweet, which uses different but synonymous words, was already returned as a first story. In this work, we propose a novel approach to mitigate this problem of lexical variation, based on tweet expansion. In particular, we propose to expand tweets with semantically related paraphrases identified via automatically mined word embeddings over a background tweet corpus. Through experimentation on a large data stream comprised of 50 million tweets, we show that FSD effectiveness can be improved by 9.5% over a stateoftheart FSD system. 
2016  Learning to Project and Binarise for HashingBased Approximate Nearest Neighbour Search  S. Moran  SIGIR  In this paper we focus on improving the effectiveness of hashingbased approximate nearest neighbour search. Generating similarity preserving hashcodes for images has been shown to be an effective and efficient method for searching through large datasets. Hashcode generation generally involves two steps: bucketing the input feature space with a set of hyperplanes, followed by quantising the projection of the datapoints onto the normal vectors to those hyperplanes. This procedure results in the makeup of the hashcodes depending on the positions of the datapoints with respect to the hyperplanes in the feature space, allowing a degree of locality to be encoded into the hashcodes. In this paper we study the effect of learning both the hyperplanes and the thresholds as part of the same model. Most previous research either learn the hyperplanes assuming a fixed set of thresholds, or viceversa. In our experiments over two standard image datasets we find statistically significant increases in retrieval effectiveness versus a host of stateoftheart datadependent and independent hashing models. 
2016  XNORNet: ImageNet Classification Using Binary Convolutional Neural Networks  M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi  ECCV  We propose two efficient approximations to standard convolutional neural networks: BinaryWeightNetworks and XNORNetworks. In BinaryWeightNetworks, the filters are approximated with binary values resulting in 32x memory saving. In XNORNetworks, both the filters and the input to convolutional layers are binary. XNORNetworks approximate convolutions using primarily binary operations. This results in 58x faster convolutional operations and 32x memory savings. XNORNets offer the possibility of running stateoftheart networks on CPUs (rather than GPUs) in realtime. Our binary networks are simple, accurate, efficient, and work on challenging visual tasks. We evaluate our approach on the ImageNet classification task. The classification accuracy with a BinaryWeightNetwork version of AlexNet is only 2.9\% less than the fullprecision AlexNet (in top1 measure). We compare our method with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than 16\% in top1 accuracy. 
2016  Affinity Preserving Quantization for Hashing: A Vector Quantization Approach to Learning Compact Binary Codes  Z. Wang, L. Duan, T. Huang, W. Gao  AAAI  Hashing techniques are powerful for approximate nearest neighbour (ANN) search. Existing quantization methods in hashing are all focused on scalar quantization (SQ) which is inferior in utilizing the inherent data distribution. In this paper, we propose a novel vector quantization (VQ) method named affinity preserving quantization (APQ) to improve the quantization quality of projection values, which has significantly boosted the performance of stateoftheart hashing techniques. In particular, our method incorporates the neighbourhood structure in the pre and postprojection data space into vector quantization. APQ minimizes the quantization errors of projection values as well as the loss of affinity property of original space. An effective algorithm has been proposed to solve the joint optimization problem in APQ, and the extension to larger binary codes has been resolved by applying product quantization to APQ. Extensive experiments have shown that APQ consistently outperforms the stateoftheart quantization methods, and has significantly improved the performance of various hashing techniques. 
2016  Efficient Training of Very Deep Neural Networks for Supervised Hashing  Ziming Zhang, Yuting Chen, Venkatesh Saligrama  CVPR  In this paper, we propose training very deep neural networks (DNNs) for supervised learning of hash codes. Existing methods in this context train relatively “shallow” networks limited by the issues arising in back propagation (e.e. vanishing gradients) as well as computational efficiency. We propose a novel and efficient training algorithm inspired by alternating direction method of multipliers (ADMM) that overcomes some of these limitations. Our method decomposes the training process into independent layerwise local updates through auxiliary variables. Empirically we observe that our training algorithm always converges and its computational complexity is linearly proportional to the number of edges in the networks. Empirically we manage to train DNNs with 64 hidden layers and 1024 nodes per layer for supervised hashing in about 3 hours using a single GPU. Our proposed very deep supervised hashing (VDSH) method significantly outperforms the stateoftheart on several benchmark datasets. 
2016  Deep Hashing Network for Efficient Similarity Retrieval  Han Zhu, Mingsheng Long, Jianmin Wang, Yue Cao  AAAI  Due to the storage and retrieval efficiency, hashing has been widely deployed to approximate nearest neighbor search for largescale multimedia retrieval. Supervised hashing, which improves the quality of hash coding by exploiting the semantic similarity on data pairs, has received increasing attention recently. For most existing supervised hashing methods for image retrieval, an image is first represented as a vector of handcrafted or machinelearned features, followed by another separate quantization step that generates binary codes. However, suboptimal hash coding may be produced, because the quantization error is not statistically minimized and the feature representation is not optimally compatible with the binary coding. In this paper, we propose a novel Deep Hashing Network (DHN) architecture for supervised hashing, in which we jointly learn good image representation tailored to hash coding and formally control the quantization error. The DHN model constitutes four key components: (1) a subnetwork with multiple convolutionpooling layers to capture image representations; (2) a fullyconnected hashing layer to generate compact binary hash codes; (3) a pairwise crossentropy loss layer for similaritypreserving learning; and (4) a pairwise quantization loss for controlling hashing quality. Extensive experiments on standard image retrieval datasets show the proposed DHN model yields substantial boosts over latest stateoftheart hashing methods. 
2015  Practical and Optimal LSH for Angular Distance  A. Andoni, P. Indyk, T. Laarhoven  NIPS  We show the existence of a LocalitySensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent. Unlike earlier algorithms with this property (e.g., Spherical LSH [1, 2]), our algorithm is also practical, improving upon the wellstudied hyperplane LSH [3] in practice. We also introduce a multiprobe version of this algorithm and conduct an experimental evaluation on real and synthetic data sets. We complement the above positive results with a finegrained lower bound for the quality of any LSH family for angular distance. Our lower bound implies that the above LSH family exhibits a tradeoff between evaluation time and quality that is close to optimal for a natural class of LSH functions. 
2015  Adaptive Hashing for Fast Similarity Search  F. Cakir, S. Sclaroff  ICCV  With the staggering growth in image and video datasets, algorithms that provide fast similarity search and compact storage are crucial. Hashing methods that map the data into Hamming space have shown promise; however, many of these methods employ a batchlearning strategy in which the computational cost and memory requirements may become intractable and infeasible with larger and larger datasets. To overcome these challenges, we propose an online learning algorithm based on stochastic gradient descent in which the hash functions are updated iteratively with streaming data. In experiments with three image retrieval benchmarks, our online algorithm attains retrieval accuracy that is comparable to competing stateoftheart batchlearning solutions, while our formulation is orders of magnitude faster and being online it is adaptable to the variations of the data. Moreover, our formulation yields improved retrieval performance over a recently reported online hashing technique, Online Kernel Hashing. 
2015  Hashing with Binary Autoencoders  M. CarreiraPerpinan, R. Raziperchikolaei  CVPR  An attractive approach for fast search in image databases is binary hashing, where each highdimensional, realvalued image is mapped onto a lowdimensional, binary vector and the search is done in this binary space. Finding the optimal hash function is difficult because it involves binary constraints, and most approaches approximate the optimization by relaxing the constraints and then binarizing the result. Here, we focus on the binary autoencoder model, which seeks to reconstruct an image from the binary code produced by the hash function. We show that the optimization can be simplified with the method of auxiliary coordinates. This reformulates the optimization as alternating two easier steps: one that learns the encoder and decoder separately, and one that optimizes the code for each image. Image retrieval experiments show the resulting hash function outperforms or is competitive with stateoftheart methods for binary hashing. 
2015  kNN Hashing with Factorized Neighborhood Representation  Kun Ding, Chunlei Huo, Bin Fan, Chunhong Pan  ICCV  Hashing is very effective for many tasks in reducing the processing time and in compressing massive databases. Although lots of approaches have been developed to learn datadependent hash functions in recent years, how to learn hash functions to yield good performance with acceptable computational and memory cost is still a challenging problem. Based on the observation that retrieval precision is highly related to the kNN classification accuracy, this paper proposes a novel kNNbased supervised hashing method, which learns hash functions by directly maximizing the kNN accuracy of the Hammingembedded training data. To make it scalable well to large problem, we propose a factorized neighborhood representation to parsimoniously model the neighborhood relationships inherent in training data. Considering that realworld data are often linearly inseparable, we further kernelize this basic model to improve its performance. As a result, the proposed method is able to learn accurate hashing functions with tolerable computation and storage cost. Experiments on four benchmarks demonstrate that our method outperforms the stateofthearts. 
2015  Scalable Graph Hashing with Feature Transformation  Q. Jiang, W. Li  IJCAI  Hashing has been widely used for approximate nearest neighbor (ANN) search in big data applications because of its low storage cost and fast retrieval speed. The goal of hashing is to map the data points from the original space into a binarycode space where the similarity (neighborhood structure) in the original space is preserved. By directly exploiting the similarity to guide the hashing code learning procedure, graph hashing has attracted much attention. However, most existing graph hashing methods cannot achieve satisfactory performance in real applications due to the high complexity for graph modeling. In this paper, we propose a novel method, called scalable graph hashing with feature transformation (SGH), for largescale graph hashing. Through feature transformation, we can effectively approximate the whole graph without explicitly computing the similarity graph matrix, based on which a sequential learning method is proposed to learn the hash functions in a bitwise manner. Experiments on two datasets with one million data points show that our SGH method can outperform the stateoftheart methods in terms of both accuracy and scalability. 
2015  Simultaneous Feature Learning and Hash Coding with Deep Neural Networks  H. Lai, Y. Pan, Y. Liu, S. Yan  CVPR  Similaritypreserving hashing is a widelyused method for nearest neighbour search in largescale image retrieval tasks. For most existing hashing methods, an image is first encoded as a vector of handengineering visual features, followed by another separate projection or quantization step that generates binary codes. However, such visual feature vectors may not be optimally compatible with the coding process, thus producing suboptimal hashing codes. In this paper, we propose a deep architecture for supervised hashing, in which images are mapped into binary codes via carefully designed deep neural networks. The pipeline of the proposed deep architecture consists of three building blocks: 1) a subnetwork with a stack of convolution layers to produce the effective intermediate image features; 2) a divideandencode module to divide the intermediate image features into multiple branches, each encoded into one hash bit; and 3) a triplet ranking loss designed to characterize that one image is more similar to the second image than to the third one. Extensive evaluations on several benchmark image datasets show that the proposed simultaneous feature learning and hash coding pipeline brings substantial improvements over other stateoftheart supervised or unsupervised hashing methods. 
2015  Hashing for Distributed Data  Cong Leng, Jiaxiang Wu, Jian Cheng, Xi Zhang and Hanqing Lu  ICML  Recently, hashing based approximate nearest neighbors search has attracted much attention. Extensive centralized hashing algorithms have been proposed and achieved promising performance. However, due to the large scale of many applications, the data is often stored or even collected in a distributed manner. Learning hash functions by aggregating all the data into a fusion center is infeasible because of the prohibitively expensive communication and computation overhead. In this paper, we develop a novel hashing model to learn hash functions in a distributed setting. We cast a centralized hashing model as a set of subproblems with consensus constraints. We find these subproblems can be analytically solved in parallel on the distributed compute nodes. Since no training data is transmitted across the nodes in the learning process, the communication cost of our model is independent to the data size. Extensive experiments on several large scale datasets containing up to 100 million samples demonstrate the efficacy of our method. 
2015  Two Birds, One Stone: Jointly Learning Binary Code for Largescale Face Image Retrieval and Attributes Prediction  Yan Li, Ruiping Wang, Haomiao Liu, Huajie Jiang, Shiguang Shan and Xilin Chen  ICCV  We address the challenging largescale contentbased face image retrieval problem, intended as searching images based on the presence of specific subject, given one face image of him/her. To this end, one natural demand is a supervised binary code learning method. While the learned codes might be discriminating, people often have a further expectation that whether some semantic message (e.g., visual attributes) can be read from the humanincomprehensible codes. For this purpose, we propose a novel binary code learning framework by jointly encoding identity discriminability and a number of facial attributes into unified binary code. In this way, the learned binary codes can be applied to not only finegrained face image retrieval, but also facial attributes prediction, which is the very innovation of this work, just like killing two birds with one stone. To evaluate the effectiveness of the proposed method, extensive experiments are conducted on a new purified largescale web celebrity database, named CFW 60K, with abundant manual identity and attributes annotation, and experimental results exhibit the superiority of our method over stateoftheart. 
2015  0Bit Consistent Weighted Sampling  P. Li  KDD  We develop 0bit consistent weighted sampling (CWS) for efficiently estimating minmax kernel, which is a generalization of the resemblance kernel originally designed for binary data. Because the estimator of 0bit CWS constitutes a positive definite kernel, this method can be naturally applied to largescale data mining problems. Basically, if we feed the sampled data from 0bit CWS to a highly efficient linear classifier (e.g., linear SVM), we effectively (and approximately) train a nonlinear classifier based on the minmax kernel. The accuracy improves as we increase the sample size. In this paper, we first demonstrate, through an extensive classification study using kernel machines, that the minmax kernel often provides an effective measure of similarity for nonnegative data. This helps justify the use of minmax kernel. However, as the minmax kernel is nonlinear and might be difficult to be used for industrial applications with massive data, we propose to linearize the minmax kernel via 0bit CWS, a simplification of the original CWS method. The previous remarkable work on consistent weighted sampling (CWS) produces samples in the form of (i, t) where the i* records the location (and in fact also the weights) information analogous to the samples produced by classical minwise hashing on binary data. Because the t* is theoretically unbounded, it was not immediately clear how to effectively implement CWS for building largescale linear classifiers. We provide a simple solution by discarding t* (which we refer to as the “0bit” scheme). Via an extensive empirical study, we show that this 0bit scheme does not lose essential information. We then apply 0bit CWS for building linear classifiers to approximate minmax kernel classifiers, as extensively validated on a wide range of public datasets. We expect this work will generate interests among data mining practitioners who would like to efficiently utilize the nonlinear information of nonbinary and nonnegative data. 
2015  Deep learning of binary hash codes for fast image retrieval  Kevin Lin, HueiFang Yang, JenHao Hsiao, ChuSong Chen  CVPRW  Approximate nearest neighbor search is an efficient strategy for largescale image retrieval. Encouraged by the recent advances in convolutional neural networks (CNNs), we propose an effective deep learning framework to generate binary hash codes for fast image retrieval. Our idea is that when the data labels are available, binary codes can be learned by employing a hidden layer for representing the latent concepts that dominate the class labels. he utilization of the CNN also allows for learning image representations. Unlike other supervised methods that require pairwised inputs for binary code learning, our method learns hash codes and image representations in a pointwised manner, making it suitable for largescale datasets. Experimental results show that our method outperforms several stateoftheart hashing algorithms on the CIFAR10 and MNIST datasets. We further demonstrate its scalability and efficacy on a largescale dataset of 1 million clothing images. 
2015  SemanticsPreserving Hashing for CrossView Retrieval  Zijia Lin, Guiguang Ding, Mingqing Hu and Jianmin Wang  CVPR  With benefits of low storage costs and high query speeds, hashing methods are widely researched for efficiently retrieving largescale data, which commonly contains multiple views, e.g. a news report with images, videos and texts. In this paper, we study the problem of crossview retrieval and propose an effective SemanticsPreserving Hashing method, termed SePH. Given semantic affinities of training data as supervised information, SePH transforms them into a probability distribution and approximates it with tobelearnt hash codes in Hamming space via minimizing the KullbackLeibler divergence. Then kernel logistic regression with a sampling strategy is utilized to learn the nonlinear projections from features in each view to the learnt hash codes. And for any unseen instance, predicted hash codes and their corresponding output probabilities from observed views are utilized to determine its unified hash code, using a novel probabilistic approach. Extensive experiments conducted on three benchmark datasets well demonstrate the effectiveness and reasonableness of SePH. 
2015  Deep Hashing for Compact Binary Codes Learning  V. Liong, J. Lu, G. Wang, P. Moulin, J. Zhou  CVPR  In this paper, we propose a new deep hashing (DH) approach to learn compact binary codes for large scale visual search. Unlike most existing binary codes learning methods which seek a single linear projection to map each sample into a binary vector, we develop a deep neural network to seek multiple hierarchical nonlinear transformations to learn these binary codes, so that the nonlinear relationship of samples can be well exploited. Our model is learned under three constraints at the top layer of the deep network: 1) the loss between the original realvalued feature descriptor and the learned binary vector is minimized, 2) the binary codes distribute evenly on each bit, and 3) different bits are as independent as possible. To further improve the discriminative power of the learned binary codes, we extend DH into supervised DH (SDH) by including one discriminative term into the objective function of DH which simultaneously maximizes the interclass variations and minimizes the intraclass variations of the learned binary codes. Experimental results show the superiority of the proposed approach over the stateofthearts. 
2015  MultiView Complementary Hash Tables for Nearest Neighbor Search  Xianglong Liu, Lei Huang, Cheng Deng, Jiwen Lu and Bo Land  ICCV  Recent years have witnessed the success of hashing techniques in fast nearest neighbor search. In practice many applications (e.g., visual search, object detection, image matching, etc.) have enjoyed the benefits of complementary hash tables and information fusion over multiple views. However, most of prior research mainly focused on compact hash code cleaning, and rare work studies how to build multiple complementary hash tables, much less to adaptively integrate information stemming from multiple views. In this paper we first present a novel multiview complementary hash table method that learns complementary hash tables from the data with multiple views. For single multiview table, using exemplar based feature fusion, we approximate the inherent data similarities with a lowrank matrix, and learn discriminative hash functions in an efficient way. To build complementary tables and meanwhile maintain scalable training and fast outofsample extension, an exemplar reweighting scheme is introduced to update the induced lowrank similarity in the sequential table construction framework, which indeed brings mutual benefits between tables by placing greater importance on exemplars shared by misseparated neighbors. Extensive experiments on three largescale image datasets demonstrate that the proposed method significantly outperforms various naive solutions and stateoftheart multitable methods. 
2015  Graph Regularised Hashing  S. Moran, V. Lavrenko  ECIR  In this paper we propose a twostep iterative scheme, Graph Regularised Hashing (GRH), for incrementally adjusting the positioning of the hashing hypersurfaces to better conform to the supervisory signal: in the first step the binary bits are regularised using a data similarity graph so that similar data points receive similar bits. In the second step the regularised hashcodes form targets for a set of binary classifiers which shift the position of each hypersurface so as to separate opposite bits with maximum margin. GRH exhibits superior retrieval accuracy to competing hashing methods. 
2015  Regularised CrossModal Hashing  S. Moran, V. Lavrenko  SIGIR  In this paper we propose Regularised CrossModal Hashing (RCMH) a new crossmodal hashing scheme that projects annotation and visual feature descriptors into a common Hamming space. RCMH optimises the intramodality similarity of datapoints in the annotation modality using an iterative threestep hashing algorithm: in the first step each training image is assigned a Kbit hashcode based on hyperplanes learnt at the previous iteration; in the second step the binary bits are smoothed by a formulation of graph regularisation so that similar datapoints have similar bits; in the third step a set of binary classifiers are trained to predict the regularised bits with maximum margin. Visual descriptors are projected into the annotation Hamming space by a set of binary classifiers learnt using the bits of the corresponding annotations as labels. RCMH is shown to consistently improve retrieval effectiveness over stateoftheart baselines. 
2015  An NMF perspective on Binary Hashing  Lopamudra Mukherjee, Sathya N. Ravi, Vamsi K. Ithapu, Tyler Holmes and Vikas Singh  ICCV  The pervasiveness of massive data repositories has led to much interest in efficient methods for indexing, search, and retrieval. For image data, a rapidly developing body of work for these applications shows impressive performance with methods that broadly fall under the umbrella term of Binary Hashing. Given a distance matrix, a binary hashing algorithm solves for a binary code for the given set of examples, whose Hamming distance nicely approximates the original distances. The formulation is nonconvex — so existing solutions adopt spectral relaxations or perform coordinate descent (or quantization) on a surrogate objective that is numerically more tractable. In this paper, we first derive an Augmented Lagrangian approach to optimize the standard binary Hashing objective (i.e., maintain fidelity with a given distance matrix). With appropriate step sizes, we find that this scheme already yields results that match or substantially outperform state of the art methods on most benchmarks used in the literature. Then, to allow the model to scale to large datasets, we obtain an interesting reformulation of the binary hashing objective as a nonnegative matrix factorization. Later, this leads to a simple multiplicative updates algorithm — whose parallelization properties are exploited to obtain a fast GPU based implementation. We give a probabilistic analysis of our initialization scheme and present a range of experiments to show that the method is simple to implement and competes favorably with available methods (both for optimization and generalization). 
2015  Top Rank Supervised Binary Coding for Visual Search  Dongjin Song, Wei Liu, Rongrong Ji, David A. Meyer, John R. Smith  ICCV  In recent years, binary coding techniques are becoming increasingly popular because of their high efficiency in handling largescale computer vision applications. It has been demonstrated that supervised binary coding techniques that leverage supervised information can significantly enhance the coding quality, and hence greatly benefit visual search tasks. Typically, a modern binary coding method seeks to learn a group of coding functions which compress data samples into binary codes. However, few methods pursued the coding functions such that the precision at the top of a ranking list according to Hamming distances of the generated binary codes is optimized. In this paper, we propose a novel supervised binary coding approach, namely Top Rank Supervised Binary Coding (TopRSBC), which explicitly focuses on optimizing the precision of top positions in a Hammingdistance ranking list towards preserving the supervision information. The core idea is to train the disciplined coding functions, by which the mistakes at the top of a Hammingdistance ranking list are penalized more than those at the bottom. To solve such coding functions, we relax the original discrete optimization objective with a continuous surrogate, and derive a stochastic gradient descent to optimize the surrogate objective. To further reduce the training time cost, we also design an online learning algorithm to optimize the surrogate objective more efficiently. Empirical studies based upon three benchmark image datasets demonstrate that the proposed binary coding approach achieves superior image search accuracy over the stateofthearts. 
2015  Hamming Compatible Quantization for Hashing  Z. Wang, L. Duan, J. Lin, X. Wang, T. Huang and W. Gao  IJCAI  Hashing is one of the effective techniques for fast Approximate Nearest Neighbour (ANN) search. Traditional singlebit quantization (SBQ) in most hashing methods incurs lots of quantization error which seriously degrades the search performance. To address the limitation of SBQ, researchers have proposed promising multibit quantization (MBQ) methods to quantize each projection dimension with multiple bits. However, some MBQ methods need to adopt specific distance for binary code matching instead of the original Hamming distance, which would significantly decrease the retrieval speed. Two typical MBQ methods Hierarchical Quantization and Double Bit Quantization retain the Hamming distance, but both of them only consider the projection dimensions during quantization, ignoring the neighborhood structure of raw data inherent in Euclidean space. In this paper, we propose a multibit quantization method named Hamming Compatible Quantization (HCQ) to preserve the capability of similarity metric between Euclidean space and Hamming space by utilizing the neighborhood structure of raw data. Extensive experiment results have shown our approach significantly improves the performance of various stateoftheart hashing methods while maintaining fast retrieval speed. 
2015  Semantic Topic Multimodal Hashing for CrossMedia Retrieval  Di Wang, Xinbo Gao, Xiumei Wang and Lihuo He  IJCAI  Multimodal hashing is essential to crossmedia similarity search for its low storage cost and fast query speed. Most existing multimodal hashing methods embedded heterogeneous data into a common lowdimensional Hamming space, and then rounded the continuous embeddings to obtain the binary codes. Yet they usually neglect the inherent discrete nature of hashing for relaxing the discrete constraints, which will cause degraded retrieval performance especially for long codes. For this purpose, a novel Semantic Topic Multimodal Hashing (STMH) is developed by considering latent semantic information in coding procedure. It first discovers clustering patterns of texts and robust factorizes the matrix of images to obtain multiple semantic topics of texts and concepts of images. Then the learned multimodal semantic features are transformed into a common subspace by their correlations. Finally, each bit of unified hash code can be generated directly by figuring out whether a topic or concept is contained in a text or an image. Therefore, the obtained model by STMH is more suitable for hashing scheme as it directly learns discrete hash codes in the coding process. Experimental results demonstrate that the proposed method outperforms several stateoftheart methods. 
2015  Convolutional Neural Networks for Text Hashing  Jiaming Xu, PengWang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang, Hongwei Hao  IJCAI  Hashing, as a popular approximate nearest neighbor search, has been widely used for largescale similarity search. Recently, a spectrum of machine learning methods are utilized to learn similaritypreserving binary codes. However, most of them directly encode the explicit features, keywords, which fail to preserve the accurate semantic similarities in binary code beyond keyword matching, especially on short texts. Here we propose a novel text hashing framework with convolutional neural networks. In particular, we first embed the keyword features into compact binary code with a locality preserving constraint. Meanwhile word features and position features are together fed into a convolutional network to learn the implicit features which are further incorporated with the explicit features to fit the pretrained binary code. Such base method can be successfully accomplished without any external tags/labels, and other three model variations are designed to integrate tags/labels. Experimental results show the superiority of our proposed approach over several stateoftheart hashing methods when tested on one short text dataset as well as one normal text dataset. 
2015  BitScalable Deep Hashing With Regularized Similarity Learning for Image Retrieval and Person ReIdentification  R. Zhang, L. Lin, R. Zhang, W. Zuo, L. Zhang  TIP  Extracting informative image features and learning effective approximate hashing functions are two crucial steps in image retrieval . Conventional methods often study these two steps separately, e.g., learning hash functions from a predefined handcrafted feature space. Meanwhile, the bit lengths of output hashing codes are preset in most previous methods, neglecting the significance level of different bits and restricting their practical flexibility. To address these issues, we propose a supervised learning framework to generate compact and bitscalable hashing codes directly from raw images. We pose hashing learning as a problem of regularized similarity learning. Specifically, we organize the training images into a batch of triplet samples, each sample containing two images with the same label and one with a different label. With these triplet samples, we maximize the margin between matched pairs and mismatched pairs in the Hamming space. In addition, a regularization term is introduced to enforce the adjacency consistency, i.e., images of similar appearances should have similar codes. The deep convolutional neural network is utilized to train the model in an endtoend fashion, where discriminative image features and hash functions are simultaneously optimized. Furthermore, each bit of our hashing codes is unequally weighted so that we can manipulate the code lengths by truncating the insignificant bits. Our framework outperforms stateofthearts on public benchmarks of similar image search and also achieves promising results in the application of person reidentification in surveillance. It is also shown that the generated bitscalable hashing codes well preserve the discriminative powers with shorter code lengths. 
2015  Deep Semantic Ranking Based Hashing for MultiLabel Image Retrieval  F. Zhao, Y. Huang, L. Wang, and T. Tan  CVPR  With the rapid growth of web images, hashing has received increasing interests in large scale image retrieval. Research efforts have been devoted to learning compact binary codes that preserve semantic similarity based on labels. However, most of these hashing methods are designed to handle simple binary similarity. The complex multilevel semantic structure of images associated with multiple labels have not yet been well explored. Here we propose a deep semantic ranking based method for learning hash functions that preserve multilevel semantic similarity between multilabel images. In our approach, deep convolutional neural network is incorporated into hash functions to jointly learn feature representations and mappings from them to hash codes, which avoids the limitation of semantic representation power of handcrafted features. Meanwhile, a ranking list that encodes the multilevel similarity information is employed to guide the learning of such deep hash functions. An effective scheme based on surrogate loss is used to solve the intractable optimization problem of nonsmooth and multivariate ranking measures involved in the learning procedure. Experimental results show the superiority of our proposed approach over several stateoftheart hashing methods in term of ranking evaluation metrics when tested on multilabel image datasets. 
2015  CrossModal Similarity Learning via Pairs, Preferences, and Active Supervision  Yi Zhen, Piyush Rai, Hongyuan Zha, and Lawrence Carin  AAAI  We present a probabilistic framework for learning pairwise similarities between objects belonging to different modalities, such as drugs and proteins, or text and images. Our framework is based on learning a binary code based representation for objects in each modality, and has the following key properties: (i) it can leverage both pairwise as well as easytoobtain relative preference based crossmodal constraints, (ii) the probabilistic framework naturally allows querying for the most useful/informative constraints, facilitating an active learning setting (existing methods for crossmodal similarity learning do not have such a mechanism), and (iii) the binary code length is learned from the data. We demonstrate the effectiveness of the proposed approach on two problems that require computing pairwise similarities between crossmodal object pairs: crossmodal link prediction in bipartite graphs, and hashing based crossmodal similarity search. 
2014  Microsoft COCO: Common Objects in Context  TsungYi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollar  We present a new dataset with the goal of advancing the stateoftheart in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using perinstance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model. 

2014  Collective Matrix Factorization Hashing for Multimodal data  G. Ding, Y. Guo, J. Zhou  CVPR  Nearest neighbor search methods based on hashing have attracted considerable attention for effective and efficient largescale similarity search in computer vision and information retrieval community. In this paper, we study the problems of learning hash functions in the context of multimodal data for crossview similarity search. We put forward a novel hashing method, which is referred to Collective Matrix Factorization Hashing (CMFH). CMFH learns unified hash codes by collective matrix factorization with latent factor model from different modalities of one instance, which can not only supports crossview search but also increases the search accuracy by merging multiple view information sources. We also prove that CMFH, a similaritypreserving hashing learning method, has upper and lower boundaries. Extensive experiments verify that CMFH significantly outperforms several stateoftheart methods on three different datasets. 
2014  Graph Cuts for Supervised Binary Coding  T. Ge, K. He, J. Sun  ECCV  Learning short binary codes is challenged by the inherent discrete nature of the problem. The graph cuts algorithm is a wellstudied discrete label assignment solution in computer vision, but has not yet been applied to solve the binary coding problems. This is partially because it was unclear how to use it to learn the encoding (hashing) functions for outofsample generalization. In this paper, we formulate supervised binary coding as a single optimization problem that involves both the encoding functions and the binary label assignment. Then we apply the graph cuts algorithm to address the discrete optimization problem involved, with no continuous relaxation. This method, named as Graph Cuts Coding (GCC), shows competitive results in various datasets. 
2014  Locally Linear Hashing for Extracting NonLinear Manifolds  G. Irie, Z. Li, X. Wu, S. Chang  CVPR  Previous efforts in hashing intend to preserve data variance or pairwise affinity, but neither is adequate in capturing the manifold structures hidden in most visual data. In this paper, we tackle this problem by reconstructing the locally linear structures of manifolds in the binary Hamming space, which can be learned by localitysensitive sparse coding. We cast the problem as a joint minimization of reconstruction error and quantization loss, and show that, despite its NPhardness, a local optimum can be obtained efficiently via alternative optimization. Our method distinguishes itself from existing methods in its remarkable ability to extract the nearest neighbors of the query from the same manifold, instead of from the ambient space. On extensive experiments on various image benchmarks, our results improve previous stateoftheart by 2874% typically, and 627% on the Yale face data. 
2014  Fast supervised hashing with decision trees for highdimensional data  Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, and David Suter.  CVPR  Supervised hashing aims to map the original features to compact binary codes that are able to preserve label based similarity in the Hamming space. Nonlinear hash functions have demonstrated their advantage over linear ones due to their powerful generalization capability. In the literature, kernel functions are typically used to achieve nonlinearity in hashing, which achieve encouraging retrieval performance at the price of slow evaluation and training time. Here we propose to use boosted decision trees for achieving nonlinearity in hashing, which are fast to train and evaluate, hence more suitable for hashing with high dimensional data. In our approach, we first propose submodular formulations for the hashing binary code inference problem and an efficient GraphCut based block search method for solving largescale inference. Then we learn hash functions by training boosted decision trees to fit the binary codes. Experiments demonstrate that our proposed method significantly outperforms most stateoftheart methods in retrieval precision and training time. Especially for highdimensional data, our method is orders of magnitude faster than many methods in terms of training time. 
2014  Optimizing Ranking Measures for Compact Binary Code Learning  Guosheng Lin, Chunhua Shen, Jianxin Wu.  ECCV  Hashing has proven a valuable tool for largescale information retrieval. Despite much success, existing hashing methods optimize over simple objectives such as the reconstruction error or graph Laplacian related loss functions, instead of the performance evaluation criteria of interest—multivariate performance measures such as the AUC and NDCG. Here we present a general framework (termed StructHash) that allows one to directly optimize multivariate performance measures. The resulting optimization problem can involve exponentially or infinitely many variables and constraints, which is more challenging than standard structured output learning. To solve the StructHash optimization problem, we use a combination of column generation and cuttingplane techniques. We demonstrate the generality of StructHash by applying it to ranking prediction and image retrieval, and show that it outperforms a few stateoftheart hashing methods. 
2014  Collaborative Hashing  X. Liu, J. He, C. Deng, B. Lang  CVPR  Hashing technique has become a promising approach for fast similarity search. Most of existing hashing research pursue the binary codes for the same type of entities by preserving their similarities. In practice, there are many scenarios involving nearest neighbor search on the data given in matrix form, where two different types of, yet naturally associated entities respectively correspond to its two dimensions or views. To fully explore the duality between the two views, we propose a collaborative hashing scheme for the data in matrix form to enable fast search in various applications such as image search using bag of words and recommendation using useritem ratings. By simultaneously preserving both the entity similarities in each view and the interrelationship between views, our collaborative hashing effectively learns the compact binary codes and the explicit hash functions for outofsample extension in an alternating optimization way. Extensive evaluations are conducted on three wellknown datasets for search inside a single view and search across different views, demonstrating that our proposed method outperforms stateoftheart baselines, with significant accuracy gains ranging from 7.67% to 45.87% relatively. 
2014  Discrete Graph Hashing  W. Liu, C. Mu, S. Kumar, S. Chang  NIPS  Hashing has emerged as a popular technique for fast nearest neighbor search in gigantic databases. In particular, learning based hashing has received considerable attention due to its appealing storage and search efficiency. However, the performance of most unsupervised learning based hashing methods deteriorates rapidly as the hash code length increases. We argue that the degraded performance is due to inferior optimization procedures used to achieve discrete binary codes. This paper presents a graphbased unsupervised hashing model to preserve the neighborhood structure of massive data in a discrete code space. We cast the graph hashing problem into a discrete optimization framework which directly learns the binary codes. A tractable alternating maximization algorithm is then proposed to explicitly deal with the discrete constraints, yielding highquality codes to well capture the local neighborhoods. Extensive experiments performed on four large datasets with up to one million samples show that our discrete optimization based graph hashing method obtains superior search accuracy over stateoftheart unsupervised hashing methods, especially for longer codes. 
2014  Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS).  A. Shrivastava, P. Li  NIPS  We present the first provably sublinear time hashing algorithm for approximate Maximum Inner Product Search (MIPS). Searching with (unnormalized) inner product as the underlying similarity measure is a known difficult problem and finding hashing schemes for MIPS was considered hard. While the existing Locality Sensitive Hashing (LSH) framework is insufficient for solving MIPS, in this paper we extend the LSH framework to allow asymmetric hashing schemes. Our proposal is based on a key observation that the problem of finding maximum inner products, after independent asymmetric transformations, can be converted into the problem of approximate near neighbor search in classical settings. This key observation makes efficient sublinear hashing scheme for MIPS possible. Under the extended asymmetric LSH (ALSH) framework, this paper provides an example of explicit construction of provably fast hashing scheme for MIPS. Our proposed algorithm is simple and easy to implement. The proposed hashing scheme leads to significant computational savings over the two popular conventional LSH schemes: (i) Sign Random Projection (SRP) and (ii) hashing based on pstable distributions for L2 norm (L2LSH), in the collaborative filtering task of item recommendations on Netflix and Movielens (10M) datasets. 
2014  Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search  A. Shrivastava, P. Li  ICML  The query complexity of locality sensitive hashing (LSH) based similarity search is dominated by the number of hash evaluations, and this number grows with the data size (Indyk & Motwani, 1998). In industrial applications such as search where the data are often highdimensional and binary (e.g., text ngrams), minwise hashing is widely adopted, which requires applying a large number of permutations on the data. This is costly in computation and energyconsumption. In this paper, we propose a hashing technique which generates all the necessary hash evaluations needed for similarity search, using one single permutation. The heart of the proposed hash function is a “rotation” scheme which densifies the sparse sketches of one permutation hashing (Li et al., 2012) in an unbiased fashion thereby maintaining the LSH property. This makes the obtained sketches suitable for hash table construction. This idea of rotation presented in this paper could be of independent interest for densifying other types of sparse sketches. Using our proposed hashing method, the query time of a (K, L)parameterized LSH is reduced from the typical O(dKL) complexity to merely O(KL + dL), where d is the number of nonzeros of the data vector, K is the number of hashes in each hash table, and L is the number of hash tables. Our experimental evaluation on real data confirms that the proposed scheme significantly reduces the query processing time over minwise hashing without loss in retrieval accuracies. 
2014  Supervised Hashing via Image Representation Learning  R. Xia, Y. Pan, H. Lai, C. Liu, S. Yan.  AAAI  Hashing is a popular approximate nearest neighbor search approach for largescale image retrieval. Supervised hashing, which incorporates similarity/dissimilarity information on entity pairs to improve the quality of hashing function learning, has recently received increasing attention. However, in the existing supervised hashing methods for images, an input image is usually encoded by a vector of handcrafted visual features. Such handcrafted feature vectors do not necessarily preserve the accurate semantic similarities of images pairs, which may often degrade the performance of hashing function learning. In this paper, we propose a supervised hashing method for image retrieval, in which we automatically learn a good image representation tailored to hashing as well as a set of hash functions. The proposed method has two stages. In the first stage, given the pairwise similarity matrix S over training images, we propose a scalable coordinate descent method to decompose S into a product of HHT where H is a matrix with each of its rows being the approximate hash code associated to a training image. In the second stage, we propose to simultaneously learn a good feature representation for the input images as well as a set of hash functions, via a deep convolutional network tailored to the learned hash codes in H and optionally the discrete class labels of the images. Extensive empirical evaluations on three benchmark datasets with different kinds of images show that the proposed method has superior performance gains over several stateoftheart supervised and unsupervised hashing methods. 
2014  Adaptive Quantization for Hashing: An InformationBased Approach to Learning Binary Codes  C. Xiong, W. Chen, G. Chen, D. Johnson, J. Corso  SDM  Largescale data mining and retrieval applications have increasingly turned to compact binary data representations as a way to achieve both fast queries and efficient data storage; many algorithms have been proposed for learning effective binary encodings. Most of these algorithms focus on learning a set of projection hyperplanes for the data and simply binarizing the result from each hyperplane, but this neglects the fact that informativeness may not be uniformly distributed across the projections. In this paper, we address this issue by proposing a novel adaptive quantization (AQ) strategy that adaptively assigns varying numbers of bits to different hyperplanes based on their information content. Our method provides an informationbased schema that preserves the neighborhood structure of data points, and we jointly find the globally optimal bitallocation for all hyperplanes. In our experiments, we compare with stateoftheart methods on four largescale datasets and find that our adaptive quantization approach significantly improves on traditional hashing methods. 
2014  Fast Approximate NearestNeighbor Field by Cascaded Spherical Hashing  I. TorresXirau, J. Salvador, E. PérezPellitero  ACCV  We present an efficient and fast algorithm for computing approximate nearest neighbor fields between two images. Our method builds on the concept of CoherencySensitive Hashing (CSH), but uses a recent hashing scheme, Spherical Hashing (SpH), which is known to be better adapted to the nearestneighbor problem for natural images. Cascaded Spherical Hashing concatenates different configurations of SpH to build larger Hash Tables with less elements in each bin to achieve higher selectivity. Our method amply outperforms existing techniques like PatchMatch and CSH, and the experimental results show that our algorithm is faster and more accurate than existing methods. 
2014  Circulant Binary Embedding  F. Yu, S. Kumar, Y. Gong, S. Chang  ICML  Binary embedding of highdimensional data requires long codes to preserve the discriminative power of the input space. Traditional binary coding methods often suffer from very high computation and storage costs in such a scenario. To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix. The circulant structure enables the use of Fast Fourier Transformation to speed up the computation. Compared to methods that use unstructured matrices, the proposed method improves the time complexity from O(d^2 ) to O(d log d), and the space complexity from O(d^2) to O(d) where d is the input dimensionality. We also propose a novel timefrequency alternating optimization to learn datadependent circulant projections, which alternatively minimizes the objective in original and Fourier domains. We show by extensive experiments that the proposed approach gives much better performance than the stateoftheart approaches for fixed time, and provides much faster computation with no performance degradation for fixed number of bits. 
2014  Largescale supervised multimodal hashing with semantic correlation maximization  D. Zhang, W. Li  AAAI  Due to its low storage cost and fast query speed, hashing has been widely adopted for similarity search in multimedia data. In particular, more and more attentions have been payed to multimodal hashing for search in multimedia data with multiple modalities, such as images with tags. Typically, supervised information of semantic labels is also available for the data points in many real applications. Hence, many supervised multimodal hashing (SMH) methods have been proposed to utilize such semantic labels to further improve the search accuracy. However, the training time complexity of most existing SMH methods is too high, which makes them unscalable to largescale datasets. In this paper, a novel SMH method, called semantic correlation maximization (SCM), is proposed to seamlessly integrate semantic labels into the hashing learning procedure for largescale data modeling. Experimental results on two realworld datasets show that SCM can signifi cantly outperform the stateoftheart SMH methods, in terms of both accuracy and scalability. 
2014  Supervised Hashing with Latent Factor Models  P. Zhang, W. Zhang, W. Li, M. Guo  SIGIR  Due to its low storage cost and fast query speed, hashing has been widely adopted for approximate nearest neighbor search in largescale datasets. Traditional hashing methods try to learn the hash codes in an unsupervised way where the metric (Euclidean) structure of the training data is preserved. Very recently, supervised hashing methods, which try to preserve the semantic structure constructed from the semantic labels of the training points, have exhibited higher accuracy than unsupervised methods. In this paper, we propose a novel supervised hashing method, called latent factor hashing (LFH), to learn similaritypreserving binary codes based on latent factor models. An algorithm with convergence guarantee is proposed to learn the parameters of LFH. Furthermore, a lineartime variant with stochastic learning is proposed for training LFH on largescale datasets. Experimental results on two large datasets with semantic labels show that LFH can achieve superior accuracy than stateoftheart methods with comparable training time. 
2013  Supervised binary hash code learning with jensen shannon divergence  Lixin Fan  ICCV  This paper proposes to learn binary hash codes within a statistical learning framework, in which an upper bound of the probability of Bayes decision errors is derived for different forms of hash functions and a rigorous proof of the convergence of the upper bound is presented. Consequently, minimizing such an upper bound leads to consistent performance improvements of existing hash code learning algorithms, regardless of whether original algorithms are unsupervised or supervised. This paper also illustrates a fast hash coding method that exploits simple binary tests to achieve orders of magnitude improvement in coding speed as compared to projection based methods. 
2013  Learning Binary Codes for HighDimensional Data Using Bilinear Projections  Y. Gong, S. Kumar, H. Rowley, S. Lazebnik  CVPR  Recent advances in visual recognition indicate that to achieve good retrieval and classification accuracy on largescale datasets like ImageNet, extremely highdimensional visual descriptors, e.g., Fisher Vectors, are needed. We present a novel method for converting such descriptors to compact similaritypreserving binary codes that exploits their natural matrix structure to reduce their dimensionality using compact bilinear projections instead of a single large projection matrix. This method achieves comparable retrieval and classification accuracy to the original descriptors and to the stateoftheart Product Quantization approach while having orders of magnitude faster code generation time and smaller memory footprint. 
2013  Learning Binary Hash Codes for LargeScale Image Search  Kristen Grauman, Rob Fergus  Machine Learning for Computer Vision  Algorithms to rapidly search massive image or video collections are critical for many vision applications, including visual search, contentbased retrieval, and nonparametric models for object recognition. Recent work shows that learned binary projections are a powerful way to index large collections according to their content. The basic idea is to formulate the projections so as to approximately preserve a given similarity function of interest. Having done so, one can then search the data efficiently using hash tables, or by exploring the Hamming ball volume around a novel query. Both enable sublinear time retrieval with respect to the database size. Further, depending on the design of the projections, in some cases it is possible to bound the number of database examples that must be searched in order to achieve a given level of accuracy. This chapter overviews data structures for fast search with binary codes, and then describes several supervised and unsupervised strategies for generating the codes. In particular, we review supervised methods that integrate metric learning, boosting, and neural networks into the hash key construction, and unsupervised methods based on spectral analysis or kernelized random projections that compute affinitypreserving binary codes.Whether learning from explicit semantic supervision or exploiting the structure among unlabeled data, these methods make scalable retrieval possible for a variety of robust visual similarity measures.We focus on defining the algorithms, and illustrate the main points with results using millions of images. 
2013  Complementary Projection Hashing  Z. Jin, Y. Hu, Y. Lin, D. Zhang, S. Lin, D. Cai, X. Li  ICCV  Recently, hashing techniques have been widely applied to solve the approximate nearest neighbors search problem in many vision applications. Generally, these hashing approaches generate 2^c buckets, where c is the length of the hash code. A good hashing method should satisfy the following two requirements: 1) mapping the nearby data points into the same bucket or nearby (measured by the Hamming distance) buckets. 2) all the data points are evenly distributed among all the buckets. In this paper, we propose a novel algorithm named Complementary Projection Hashing (CPH) to find the optimal hashing functions which explicitly considers the above two requirements. Specifically, CPH aims at sequentially finding a series of hyperplanes (hashing functions) which cross the sparse region of the data. At the same time, the data points are evenly distributed in the hypercubes generated by these hyperplanes. The experiments comparing with the stateoftheart hashing methods demonstrate the effectiveness of the proposed method. 
2013  Learning Hash Functions Using Column Generation  X. Li, G. Lin, C. Shen, A. Hengel, A. Dick  ICML  Fast nearest neighbor searching is becoming an increasingly important tool in solving many largescale problems. Recently a number of approaches to learning datadependent hash functions have been developed. In this work, we propose a column generation based method for learning datadependent hash functions on the basis of proximity comparison information. Given a set of triplets that encode the pairwise proximity comparison information, our method learns hash functions that preserve the relative comparison relationships in the data as well as possible within the largemargin learning framework. The learning procedure is implemented using column generation and hence is named CGHash. At each iteration of the column generation procedure, the best hash function is selected. Unlike most other hashing methods, our method generalizes to new data points naturally; and has a training objective which is convex, thus ensuring that the global optimum can be identi fied. Experiments demonstrate that the proposed method learns compact binary codes and that its retrieval performance compares favorably with stateoftheart methods when tested on a few benchmark datasets. 
2013  A General TwoStep Approach to LearningBased Hashing  G. Lin, C. Shen, D. Suter, A. Hengel  ICCV  Most existing approaches to hashing apply a single form of hash function, and an optimization process which is typically deeply coupled to this specific form. This tight coupling restricts the flexibility of the method to respond to the data, and can result in complex optimization problems that are difficult to solve. Here we propose a flexible yet simple framework that is able to accommodate different types of loss functions and hash functions. This framework allows a number of existing approaches to hashing to be placed in context, and simplifies the development of new problemspecific hashing methods. Our framework decomposes hashing learning problem into two steps: hash bit learning and hash function learning based on the learned bits. The first step can typically be formulated as binary quadratic problems, and the second step can be accomplished by training standard binary classifiers. Both problems have been extensively studied in the literature. Our extensive experiments demonstrate that the proposed framework is effective, flexible and outperforms the stateoftheart. 
2013  Hash Bit Selection: a Unified Solution for Selection Problems in Hashing  X. Liu, J. He, B. Lang, S. Chang  CVPR  Hashing based methods recently have been shown promising for largescale nearest neighbor search. However, good designs involve difficult decisions of many unknowns – data features, hashing algorithms, parameter settings, kernels, etc. In this paper, we provide a unified solution as hash bit selection, i.e., selecting the most informative hash bits from a pool of candidates that may have been generated under various conditions mentioned above. We represent the candidate bit pool as a vertex and edgeweighted graph with the pooled bits as vertices. Then we formulate the bit selection problem as quadratic programming over the graph, and solve it efficiently by replicator dynamics. Extensive experiments show that our bit selection approach can achieve superior performance over both naive selection methods and stateoftheart methods under each scenario, usually with significant accuracy gains from 10% to 50% relatively. 
2013  Neighbourhood Preserving Quantisation for LSH  S. Moran, V. Lavrenko, and M. Osborne  SIGIR  We introduce a scheme for optimally allocating multiple bits per hyperplane for Locality Sensitive Hashing (LSH). Existing approaches binarise LSH projections by thresholding at zero yielding a single bit per dimension. We demonstrate that this is a suboptimal bit allocation approach that can easily destroy the neighbourhood structure in the original feature space. Our proposed method, dubbed Neighbourhood Preserving Quantization (NPQ), assigns multiple bits per hyperplane based upon adaptively learned thresholds. NPQ exploits a pairwise affinity matrix to discretise each dimension such that nearest neighbours in the original feature space fall within the same quantisation thresholds and are therefore assigned identical bits. NPQ is not only applicable to LSH, but can also be applied to any lowdimensional projection scheme. Despite using half the number of hyperplanes, NPQ is shown to improve LSHbased retrieval accuracy by up to 65% compared to the stateoftheart. 
2013  Variable Bit Quantisation for LSH  S. Moran, V. Lavrenko, and M. Osborne  ACL  We introduce a scheme for optimally allocating a variable number of bits per LSH hyperplane. Previous approaches assign a constant number of bits per hyperplane. This neglects the fact that a subset of hyperplanes may be more informative than others. Our method, dubbed Variable Bit Quantisation (VBQ), provides a datadriven nonuniform bit allocation across hyperplanes. Despite only using a fraction of the available hyperplanes, VBQ outperforms uniform quantisation by up to 168% for retrieval across standard text and image datasets. 
2013  The Power of Asymmetry in Binary Hashing  B. Neyshabur, R. Salakhutdinov, N. Srebro  NIPS  When approximating binary similarity using the hamming distance between short binary hashes, we show that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps. I.e. by approximating the similarity between x and x 0 as the hamming distance between f(x) and g(x0), for two distinct binary codes f, g, rather than as the hamming distance between f(x) and f(x0). 
2013  Comparing apples to oranges: a scalable solution with heterogeneous hashing  M. Ou, P. Cui, F. Wang, J. Wang, W. Zhu, S. Yang  KDD  Although hashing techniques have been popular for the large scale similarity search problem, most of the existing methods for designing optimal hash functions focus on homogeneous similarity assessment, i.e., the data entities to be indexed are of the same type. Realizing that heterogeneous entities and relationships are also ubiquitous in the real world applications, there is an emerging need to retrieve and search similar or relevant data entities from multiple heterogeneous domains, e.g., recommending relevant posts and images to a certain Facebook user. In this paper, we address the problem of ``comparing apples to oranges’’ under the large scale setting. Specifically, we propose a novel Relationaware Heterogeneous Hashing (RaHH), which provides a general framework for generating hash codes of data entities sitting in multiple heterogeneous domains. Unlike some existing hashing methods that map heterogeneous data in a common Hamming space, the RaHH approach constructs a Hamming space for each type of data entities, and learns optimal mappings between them simultaneously. This makes the learned hash codes flexibly cope with the characteristics of different data domains. Moreover, the RaHH framework encodes both homogeneous and heterogeneous relationships between the data entities to design hash functions with improved accuracy. To validate the proposed RaHH method, we conduct extensive evaluations on two large datasets; one is crawled from a popular social media sites, Tencent Weibo, and the other is an open dataset of Flickr(NUSWIDE). The experimental results clearly demonstrate that the RaHH outperforms several stateoftheart hashing methods with significant performance gains. 
2013  InterMedia Hashing for LargeScale Retrieval from Heterogeneous Data Sources  J. Song, Y. Yang, Y. Yang, Z. Huang, H. Shen  SIGMOD  In this paper, we present a new multimedia retrieval paradigm to innovate largescale search of heterogenous multimedia data. It is able to return results of different media types from heterogeneous data sources, e.g., using a query image to retrieve relevant text documents or images from different data sources. This utilizes the widely available data from different sources and caters for the current users’ demand of receiving a result list simultaneously containing multiple types of data to obtain a comprehensive understanding of the query’s results. To enable largescale intermedia retrieval, we propose a novel intermedia hashing (IMH) model to explore the correlations among multiple media types from different data sources and tackle the scalability issue. To this end, multimedia data from heterogeneous data sources are transformed into a common Hamming space, in which fast search can be easily implemented by XOR and bitcount operations. Furthermore, we integrate a linear regression model to learn hashing functions so that the hash codes for new data points can be efficiently generated. Experiments conducted on realworld largescale multimedia datasets demonstrate the superiority of our proposed method compared with stateoftheart techniques. 
2013  Streaming Similarity Search over one Billion Tweets using Parallel LocalitySensitive Hashing  Narayanan Sundaram, Aizana Turmukhametova, Nadathur Satish, Todd Mostak, Piotr Indyk, Samuel Madden and Pradeep Dubey  VLDB  Finding nearest neighbors has become an important operation on databases, with applications to text search, multimedia indexing, and many other areas. One popular algorithm for similarity search, especially for high dimensional data (where spatial indexes like kdtrees do not perform well) is Locality Sensitive Hashing (LSH), an approximation algorithm for finding similar objects. In this paper, we describe a new variant of LSH, called Parallel LSH (PLSH) designed to be extremely efficient, capable of scaling out on multiple nodes and multiple cores, and which supports highthroughput streaming of new data. Our approach employs several novel ideas, including: cacheconscious hash table layout, using a 2level merge algorithm for hash table construction; an efficient algorithm for duplicate elimination during hashtable querying; an insertoptimized hash table structure and efficient data expiration algorithm for streaming data; and a performance model that accurately estimates performance of the algorithm and can be used to optimize parameter settings. We show that on a workload where we perform similarity search on a dataset of > 1 Billion tweets, with hundreds of millions of new tweets per day, we can achieve query times of 1–2.5 ms. We show that this is an order of magnitude faster than existing indexing schemes, such as inverted indexes. To the best of our knowledge, this is the fastest implementation of LSH, with table construction times up to 3.7x faster and query times that are 8.3x faster than a basic implementation. 
2013  Harmonious Hashing  B. Xu, J. Bu, Y. Chen, X. He, D. Cai  IJCAI  Hashingbased fast nearest neighbor search technique has attracted great attention in both research and industry areas recently. Many existing hashing approaches encode data with projectionbased hash functions and represent each projected dimension by 1bit. However, the dimensions with high variance hold large energy or information of data but treated equivalently as dimensions with low variance, which leads to a serious information loss. In this paper, we introduce a novel hashing algorithm called Harmonious Hashing which aims at learning hash functions with low information loss. Specifically, we learn a set of optimized projections to preserve the maximum cumulative energy and meet the constraint of equivalent variance on each dimension as much as possible. In this way, we could minimize the information loss after binarization. Despite the extreme simplicity, our method outperforms superiorly to many stateoftheart hashing methods in largescale and highdimensional nearest neighbor search experiments. 
2013  Linear crossmodal hashing for efficient multimedia search  Xiaofeng Zhu, Zi Huang, Heng Tao Shen, Xin Zhao  MM  Most existing crossmodal hashing methods suffer from the scalability issue in the training phase. In this paper, we propose a novel crossmodal hashing approach with a linear time complexity to the training data size, to enable scalable indexing for multimedia search across multiple modals. Taking both the intrasimilarity in each modal and the intersimilarity across different modals into consideration, the proposed approach aims at effectively learning hash functions from largescale training datasets. More specifically, for each modal, we first partition the training data into $k$ clusters and then represent each training data point with its distances to $k$ centroids of the clusters. Interestingly, such a kdimensional data representation can reduce the time complexity of the training phase from traditional O(n2) or higher to O(n), where $n$ is the training data size, leading to practical learning on largescale datasets. We further prove that this new representation preserves the intrasimilarity in each modal. To preserve the intersimilarity among data points across different modals, we transform the derived data representations into a common binary subspace in which binary codes from all the modals are “consistent” and comparable. The transformation simultaneously outputs the hash functions for all modals, which are used to convert unseen data into binary codes. Given a query of one modal, it is first mapped into the binary codes using the modal’s hash functions, followed by matching the database binary codes of any other modals. Experimental results on two benchmark datasets confirm the scalability and the effectiveness of the proposed approach in comparison with the state of the art. 
2012  Spherical Hashing  J. Heo, Y. Lee, J. He, S. Chang, S. Yoon  CVPR  Many binary code encoding schemes based on hashing have been actively studied recently, since they can provide efficient similarity search, especially nearest neighbor search, and compact data representations suitable for handling large scale image databases in many computer vision problems. Existing hashing techniques encode highdimensional data points by using hyperplanebased hashing functions. In this paper we propose a novel hyperspherebased hashing function, spherical hashing, to map more spatially coherent data points into a binary code compared to hyperplanebased hashing functions. Furthermore, we propose a new binary code distance function, spherical Hamming distance, that is tailored to our hyperspherebased binary coding scheme, and design an efficient iterative optimization process to achieve balanced partitioning of data points for each hash function and independence between hashing functions. Our extensive experiments show that our spherical hashing technique significantly outperforms six stateoftheart hashing techniques based on hyperplanes across various image benchmarks of sizes ranging from one to 75 million of GIST descriptors. The performance gains are consistent and large, up to 100% improvements. The excellent results confirm the unique merits of the proposed idea in using hyperspheres to encode proximity regions in highdimensional spaces. Finally, our method is intuitive and easy to implement. 
2012  Manhattan Hashing for LargeScale Image Retrieval  W. Kong, W. Li, M. Guo  SIGIR  Hashing is used to learn binarycode representation for data with expectation of preserving the neighborhood structure in the original feature space. Due to its fast query speed and reduced storage cost, hashing has been widely used for efficient nearest neighbor search in a large variety of applications like text and image retrieval. Most existing hashing methods adopt Hamming distance to measure the similarity (neighborhood) between points in the hashcode space. However, one problem with Hamming distance is that it may destroy the neighborhood structure in the original feature space, which violates the essential goal of hashing. In this paper, Manhattan hashing (MH), which is based on Manhattan distance, is proposed to solve the problem of Hamming distance based hashing. The basic idea of MH is to encode each projected dimension with multiple bits of natural binary code (NBC), based on which the Manhattan distance between points in the hashcode space is calculated for nearest neighbor search. MH can effectively preserve the neighborhood structure in the data to achieve the goal of hashing. To the best of our knowledge, this is the first work to adopt Manhattan distance with NBC for hashing. Experiments on several largescale image data sets containing up to one million points show that our MH method can significantly outperform other stateoftheart methods. 
2012  DoubleBit Quantisation for Hashing  W. Kong, W. Li  AAAI  Hashing, which tries to learn similaritypreserving binary codes for data representation, has been widely used for efficient nearest neighbor search in massive databases due to its fast query speed and low storage cost. Because it is NP hard to directly compute the best binary codes for a given data set, mainstream hashing methods typically adopt a twostage strategy. In the first stage, several projected dimensions of real values are generated. Then in the second stage, the real values will be quantized into binary codes by thresholding. Currently, most existing methods use one single bit to quantize each projected dimension. One problem with this singlebit quantization (SBQ) is that the threshold typically lies in the region of the highest point density and consequently a lot of neighboring points close to the threshold will be hashed to totally different bits, which is unexpected according to the principle of hashing. In this paper, we propose a novel quantization strategy, called doublebit quantization (DBQ), to solve the problem of SBQ. The basic idea of DBQ is to quantize each projected dimension into double bits with adaptively learned thresholds. Extensive experiments on two real data sets show that our DBQ strategy can signifi cantly outperform traditional SBQ strategy for hashing. 
2012  Isotropic Hashing  W. Kong, W. Li  NIPS  Most existing hashing methods adopt some projection functions to project the original data into several dimensions of real values, and then each of these projected dimensions is quantized into one bit (zero or one) by thresholding. Typically, the variances of different projected dimensions are different for existing projection functions such as principal component analysis (PCA). Using the same number of bits for different projected dimensions is unreasonable because largervariance dimensions will carry more information. Although this viewpoint has been widely accepted by many researchers, it is still not verified by either theory or experiment because no methods have been proposed to find a projection with equal variances for different dimensions. In this paper, we propose a novel method, called isotropic hashing (IsoHash), to learn projection functions which can produce projected dimensions with isotropic variances (equal variances). Experimental results on real data sets show that IsoHash can outperform its counterpart with different variances for different dimensions, which verifies the viewpoint that projections with isotropic variances will be better than those with anisotropic variances. 
2012  Supervised Hashing with Kernels  W. Liu, J. Wang, R. Ji, Y. Jiang, S. Chang  CVPR  Recent years have witnessed the growing popularity of hashing in largescale vision problems. It has been shown that the hashing quality could be boosted by leveraging supervised information into hash function learning. However, the existing supervised methods either lack adequate performance or often incur cumbersome model training. In this paper, we propose a novel kernelbased supervised hashing model which requires a limited amount of supervised information, i.e., similar and dissimilar data pairs, and a feasible training cost in achieving high quality hashing. The idea is to map the data to compact binary codes whose Hamming distances are minimized on similar pairs and simultaneously maximized on dissimilar pairs. Our approach is distinct from prior works by utilizing the equivalence between optimizing the code inner products and the Hamming distances. This enables us to sequentially and efficiently train the hash functions one bit at a time, yielding very short yet discriminative codes. We carry out extensive experiments on two image benchmarks with up to one million samples, demonstrating that our approach significantly outperforms the stateofthearts in searching both metric distance neighbors and semantically similar neighbors, with accuracy gains ranging from 13% to 46%. 
2012  Hamming Distance Metric Learning  M. Norouzi, D. Fleet, R. Salakhutdinov  NIPS  Motivated by largescale multimedia applications we propose to learn mappings from highdimensional data to binary codes that preserve semantic similarity. Binary codes are well suited to largescale applications as they are storage efficient and permit exact sublinear kNN search. The framework is applicable to broad families of mappings, and uses a flexible form of triplet ranking loss. We overcome discontinuous optimization of the discrete mappings by minimizing a piecewisesmooth upper bound on empirical loss, inspired by latent structural SVMs. We develop a new lossaugmented inference algorithm that is quadratic in the code length. We show strong retrieval performance on CIFAR10 and MNIST, with promising classification results using no more than kNN on the binary codes. 
2012  Using paraphrases for improving first story detection in news and Twitter  S. Petrovic, M. Osborne, V. Lavrenko  NAACL  First story detection (FSD) involves identifying first stories about events from a continuous stream of documents. A major problem in this task is the high degree of lexical variation in documents which makes it very difficult to detect stories that talk about the same event but expressed using different words. We suggest using paraphrases to alleviate this problem, making this the first work to use paraphrases for FSD. We show a novel way of integrating paraphrases with locality sensitive hashing (LSH) in order to obtain an efficient FSD system that can scale to very large datasets. Our system achieves stateoftheart results on the first story detection task, beating both the best supervised and unsupervised systems. To test our approach on large data, we construct a corpus of events for Twitter, consisting of 50 million documents, and show that paraphrasing is also beneficial in this domain. 
2012  Multidimensional Spectral Hashing  Y. Weiss, R. Fergus, A. Torralba  ECCV  en a surge of interest in methods based on “semantic hashing”, i.e. compact binary codes of datapoints so that the Hamming distance between codewords correlates with similarity. In reviewing and comparing existing methods, we show that their relative performance can change drastically depending on the definition of groundtruth neighbors. Motivated by this finding, we propose a new formulation for learning binary codes which seeks to reconstruct the affinity between datapoints, rather than their distances. We show that this criterion is intractable to solve exactly, but a spectral relaxation gives an algorithm where the bits correspond to thresholded eigenvectors of the affinity matrix, and as the number of datapoints goes to infinity these eigenvectors converge to eigenfunctions of LaplaceBeltrami operators, similar to the recently proposed Spectral Hashing (SH) method. Unlike SH whose performance may degrade as the number of bits increases, the optimal code using our formulation is guaranteed to faithfully reproduce the affinities as the number of bits increases. We show that the number of eigenfunctions needed may increase exponentially with dimension, but introduce a “kernel trick” to allow us to compute with an exponentially large number of bits but using only memory and computation that grows linearly with dimension. Experiments shows that MDSH outperforms the stateofthe art, especially in the challenging regime of small distance thresholds. 
2012  CoRegularized Hashing for Multimodal Data  Y. Zhen, D. Yeung  NIPS  Hashingbased methods provide a very promising approach to largescale similarity search. To obtain compact hash codes, a recent trend seeks to learn the hash functions from data automatically. In this paper, we study hash function learning in the context of multimodal data. We propose a novel multimodal hash function learning method, called CoRegularized Hashing (CRH), based on a boosted coregularization framework. The hash functions for each bit of the hash codes are learned by solving DC (difference of convex functions) programs, while the learning for multiple bits proceeds via a boosting procedure so that the bias introduced by the hash functions can be sequentially minimized. We empirically compare CRH with two stateoftheart multimodal hash function learning methods on two publicly available data sets. 
2011  Iterative Quantization: A Procrustean Approach to Learning Binary Codes  Y. Gong, S. Lazebnik  CVPR  This paper addresses the problem of learning similarity preserving binary codes for efficient retrieval in largescale image collections. We propose a simple and efficient alternating minimization scheme for finding a rotation of zerocentered data so as to minimize the quantization error of mapping this data to the vertices of a zerocentered binary hypercube. This method, dubbed iterative quantization (ITQ), has connections to multiclass spectral clustering and to the orthogonal Procrustes problem, and it can be used both with unsupervised data embeddings such as PCA and supervised embeddings such as canonical correlation analysis (CCA). Our experiments show that the resulting binary coding schemes decisively outperform several other stateoftheart methods. 
2011  Learning hash functions for crossview similarity search  S. Kumar, R. Udupa  IJCAI  Many applications in Multilingual and Multimodal Information Access involve searching large databases of high dimensional data objects with multiple (conditionally independent) views. In this work we consider the problem of learning hash functions for similarity search across the views for such applications. We propose a principled method for learning a hash function for each view given a set of multiview training data objects. The hash functions map similar objects to similar codes across the views thus enabling crossview similarity search. We present results from an extensive empirical study of the proposed approach which demonstrate its effectiveness on Japanese language People Search and Multilingual People Search problems. 
2011  Hashing with Graphs  W. Liu, J. Wang, S. Kumar, S. Chang  ICML  Hashing is becoming increasingly popular for efficient nearest neighbor search in massive databases. However, learning short codes that yield good search performance is still a challenge. Moreover, in many cases realworld data lives on a lowdimensional manifold, which should be taken into account to capture meaningful nearest neighbors. In this paper, we propose a novel graphbased hashing method which automatically discovers the neighborhood structure inherent in the data to learn appropriate compact codes. To make such an approach computationally feasible, we utilize Anchor Graphs to obtain tractable lowrank adjacency matrices. Our formulation allows constant time hashing of a new data point by extrapolating graph Laplacian eigenvectors to eigenfunctions. Finally, we describe a hierarchical threshold learning procedure in which each eigenfunction yields multiple bits, leading to higher search accuracy. Experimental comparison with the other stateoftheart methods on two large datasets demonstrates the efficacy of the proposed method. 
2011  Minimal Loss Hashing  M. Norouzi, D. Fleet  ICML  We propose a method for learning similaritypreserving hash functions that map highdimensional data onto binary codes. The formulation is based on structured prediction with latent variables and a hingelike loss function. It is efficient to train for large datasets, scales well to large code lengths, and outperforms stateoftheart methods. 
2011  Random Maximum Margin Hashing  A. Joly, O. Buisson  CVPR  Following the success of hashing methods for multidimensional indexing, more and more works are interested in embedding visual feature space in compact hash codes. Such approaches are not an alternative to using index structures but a complementary way to reduce both the memory usage and the distance computation cost. Several data dependent hash functions have notably been proposed to closely fit data distribution and provide better selectivity than usual random projections such as LSH. However, improvements occur only for relatively small hash code sizes up to 64 or 128 bits. As discussed in the paper, this is mainly due to the lack of independence between the produced hash functions. We introduce a new hash function family that attempts to solve this issue in any kernel space. Rather than boosting the collision probability of close points, our method focus on data scattering. By training purely random splits of the data, regardless the closeness of the training samples, it is indeed possible to generate consistently more independent hash functions. On the other side, the use of large margin classifiers allows to maintain good generalization performances. Experiments show that our new Random Maximum Margin Hashing scheme (RMMH) outperforms four stateoftheart hashing methods, notably in kernel spaces. 
2011  Composite Hashing with Multiple Information Sources  D. Zhang, F. Wang, L. Si  SIGIR  Similarity search applications with a large amount of text and image data demands an efficient and effective solution. One useful strategy is to represent the examples in databases as compact binary codes through semantic hashing, which has attracted much attention due to its fast query/search speed and drastically reduced storage requirement. All of the current semantic hashing methods only deal with the case when each example is represented by one type of features. However, examples are often described from several different information sources in many real world applications. For example, the characteristics of a webpage can be derived from both its content part and its associated links. To address the problem of learning good hashing codes in this scenario, we propose a novel research problem – Composite Hashing with Multiple Information Sources (CHMIS). The focus of the new research problem is to design an algorithm for incorporating the features from different information sources into the binary hashing codes efficiently and effectively. In particular, we propose an algorithm CHMISAW (CHMIS with Adjusted Weights) for learning the codes. The proposed algorithm integrates information from several different sources into the binary hashing codes by adjusting the weights on each individual source for maximizing the coding performance, and enables fast conversion from query examples to their binary hashing codes. Experimental results on five different datasets demonstrate the superior performance of the proposed method against several other stateoftheart semantic hashing techniques. 
2010  Hashing Hyperplane Queries to Near Points with Applications to LargeScale Active Learning  P. Jain, S. Vijayanarasimhan, K. Grauman  NIPS  We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the database. We propose two hashingbased solutions. Our first approach maps the data to twobit binary keys that are localitysensitive for the angle between the hyperplane normal and a database point. Our second approach embeds the data into a vector space where the Euclidean norm reflects the desired distance between the original points and hyperplane query. Both use hashing to retrieve near points in sublinear time. Our first method’s preprocessing stage is more efficient, while the second has stronger accuracy guarantees. We apply both to poolbased active learning: taking the current hyperplane classifier as a query, our algorithm identifies those points (approximately) satisfying the wellknown minimal distancetohyperplane selection criterion. We empirically demonstrate our methods’ tradeoffs, and show that they make it practical to perform active selection with millions of unlabeled points. 
2010  Streaming First Story Detection with application to Twitter  S. Petrovic, M. Osborne, V. Lavrenko  NAACL  With the recent rise in popularity and size of social media, there is a growing need for systems that can extract useful information from this amount of data. We address the problem of detecting new events from a stream of Twitter posts. To make event detection feasible on webscale corpora, we present an algorithm based on localitysensitive hashing which is able overcome the limitations of traditional approaches, while maintaining competitive results. In particular, a comparison with a stateoftheart system on the first story detection task shows that we achieve over an order of magnitude speedup in processing time, while retaining comparable performance. Event detection experiments on a collection of 160 million Twitter posts show that celebrity deaths are the fastest spreading news on Twitter. 
2010  Semisupervised hashing for scalable image retrieval  J. Wang, S. Kumar, and S. Chang  CVPR  Large scale image search has recently attracted considerable attention due to easy availability of huge amounts of data. Several hashing methods have been proposed to allow approximate but highly efficient search. Unsupervised hashing methods show good performance with metric distances but, in image search, semantic similarity is usually given in terms of labeled pairs of images. There exist supervised hashing methods that can handle such semantic similarity but they are prone to overfitting when labeled data is small or noisy. Moreover, these methods are usually very slow to train. In this work, we propose a semisupervised hashing method that is formulated as minimizing empirical error on the labeled data while maximizing variance and independence of hash bits over the labeled and unlabeled data. The proposed method can handle both metric as well as semantic similarity. The experimental results on two large datasets (up to one million samples) demonstrate its superior performance over stateoftheart supervised and unsupervised methods. 
2010  Sequential projection learning for hashing with compact codes  J. Wang, S. Kumar, S. Chang  ICML  Hashing based Approximate Nearest Neighbor (ANN) search has attracted much attention due to its fast query time and drastically reduced storage. However, most of the hashing methods either use random projections or extract principal directions from the data to derive hash functions. The resulting embedding suffers from poor discrimination when compact codes are used. In this paper, we propose a novel datadependent projection learning method such that each hash function is designed to correct the errors made by the previous one sequentially. The proposed method easily adapts to both unsupervised and semisupervised scenarios and shows significant performance gains over the stateoftheart methods on two large datasets containing up to 1 million points. 
2010  A New Approach to CrossModal Multimedia Retrieval  N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. Lanckriet, R.Levy and N. Vasconcelos  ICME  The collected documents are selected sections from the Wikipedia’s featured articles collection. This is a continuously growing dataset, that at the time of collection (October 2009) had 2,669 articles spread over 29 categories. Some of the categories are very scarce, therefore we considered only the 10 most populated ones. The articles generally have multiple sections and pictures. We have split them into sections based on section headings, and assign each image to the section in which it was placed by the author(s). Then this dataset was prunned to keep only sections that contained a single image and at least 70 words. The final corpus contains 2,866 multimedia documents. The median text length is 200 words. 
2010  SelfTaught Hashing for Fast Similarity Search  D. Zhang, J. Wang, D. Cai, and J. Lu  SIGIR  The ability of fast similarity search at large scale is of great importance to many Information Retrieval (IR) applications. A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes (within a short Hamming distance). Although some recently proposed techniques are able to generate highquality codes for documents known in advance, obtaining the codes for previously unseen documents remains to be a very challenging problem. In this paper, we emphasise this issue and propose a novel SelfTaught Hashing (STH) approach to semantic hashing: we first find the optimal lbit binary codes for all documents in the given corpus via unsupervised learning, and then train l classifiers via supervised learning to predict the lbit code for any query document unseen before. Our experiments on three realworld text datasets show that the proposed approach using binarised Laplacian Eigenmap (LapEig) and linear Support Vector Machine (SVM) outperforms stateoftheart techniques significantly. 
2009  Learning Multiple Layers of Features from Tiny Images  A. Krizhevsky  University of Toronto  Groups at MIT and NYU have collected a dataset of millions of tiny colour images from the web. It is, in principle, an excellent dataset for unsupervised training of deep generative models, but previous researchers who have tried this have found it difficult to learn a good set of filters from the images. We show how to train a multilayer generative model that learns to extract meaningful features which resemble those found in the human visual cortex. Using a novel parallelization algorithm to distribute the work among multiple machines connected on a network, we show how training such a model can be done in reasonable time. A second problematic aspect of the tiny images dataset is that there are no reliable class labels which makes it hard to use for object recognition experiments. We created two sets of reliable labels. The CIFAR10 set has 6000 examples of each of 10 classes and the CIFAR100 set has 600 examples of each of 100 nonoverlapping classes. Using these labels, we show that object recognition is significantly improved by pretraining a layer of features on a large set of unlabeled tiny images. 
2009  ImageNet: A largescale hierarchical image database  J. Deng, W. Dong, R. Socher, L. Li, K. Li, L. FeiFei  CVPR  The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a largescale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 5001000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a largescale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond. 
2009  Fast Similarity Search for Learned Metrics  P. Jain, B. Kulis, K. Grauman  TPAMI  We propose a method to efficiently index into a large database of examples according to a learned metric. Given a collection of examples, we learn a Mahalanobis distance using an informationtheoretic metric learning technique that adapts prior knowledge about pairwise distances to incorporate similarity and dissimilarity constraints. To enable sublinear time similarity search under the learned metric, we show how to encode a learned Mahalanobis parameterization into randomized localitysensitive hash functions. We further formulate an indirect solution that enables metric learning and hashing for sparse input vector spaces whose high dimensionality make it infeasible to learn an explicit weighting over the feature dimensions. We demonstrate the approach applied to systems and image datasets, and show that our learned metrics improve accuracy relative to commonlyused metric baselines, while our hashing construction permits effi cient indexing with a learned distance and very large databases. 
2009  Kernelized LocalitySensitive Hashing for Scalable Image Search  B. Kulis, K. Grauman  ICCV  Fast retrieval methods are critical for largescale and datadriven vision applications. Recent work has explored ways to embed highdimensional features or complex distance functions into a lowdimensional Hamming space where items can be efficiently searched. However, existing methods do not apply for highdimensional kernelized data when the underlying feature embedding for the kernel is unknown. We show how to generalize localitysensitive hashing to accommodate arbitrary kernel functions, making it possible to preserve the algorithm’s sublinear time similarity search guarantees for a wide class of useful similarity functions. Since a number of successful imagebased kernels have unknown or incomputable embeddings, this is especially valuable for image retrieval tasks. We validate our technique on several largescale datasets, and show that it enables accurate and fast performance for examplebased object classification, feature matching, and contentbased retrieval. 
2009  Learning to Hash with Binary Reconstructive Embeddings  B. Kulis, T. Darrell  NIPS  Fast retrieval methods are increasingly critical for many largescale analysis tasks, and there have been several recent methods that attempt to learn hash functions for fast and accurate nearest neighbor searches. In this paper, we develop an algorithm for learning hash functions based on explicitly minimizing the reconstruction error between the original distances and the Hamming distances of the corresponding binary embeddings. We develop a scalable coordinatedescent algorithm for our proposed hashing objective that is able to efficiently learn hash functions in a variety of settings. Unlike existing methods such as semantic hashing and spectral hashing, our method is easily kernelized and does not require restrictive assumptions about the underlying distribution of the data. We present results over several domains to demonstrate that our method outperforms existing stateoftheart techniques. 
2009  NUSWIDE: a realworld web image database from National University of Singapore  T. Chua, J. Tang, R. Hong, H. Li, Z. Luo, Y. Zheng  CIVR  This paper introduces a web image dataset created by NUS’s Lab for Media Search. The dataset includes: (1) 269,648 images and the associated tags from Flickr, with a total of 5,018 unique tags; (2) six types of lowlevel features extracted from these images, including 64D color histogram, 144D color correlogram, 73D edge direction histogram, 128D wavelet texture, 225D blockwise color moments extracted over 5x5 fixed grid partitions, and 500D bag of words based on SIFT descriptions; and (3) groundtruth for 81 concepts that can be used for evaluation. Based on this dataset, we highlight characteristics of Web image collections and identify four research issues on web image annotation and retrieval. We also provide the baseline results for web image annotation by learning from the tags using the traditional kNN algorithm. The benchmark results indicate that it is possible to learn effective models from sufficiently large image dataset to facilitate general image retrieval. 
2009  Localitysensitive binary codes from shiftinvariant kernels  M. Raginsky, S. Lazebnik  NIPS  This paper addresses the problem of designing binary codes for highdimensional data such that vectors that are similar in the original space map to similar binary strings. We introduce a simple distributionfree encoding scheme based on random projections, such that the expected Hamming distance between the binary codes of two vectors is related to the value of a shiftinvariant kernel (e.g., a Gaussian kernel) between the vectors. We present a full theoretical analysis of the convergence properties of the proposed scheme, and report favorable experimental performance as compared to a recent stateoftheart method, spectral hashing. 
2009  Searching with quantization: approximate nearest neighbor search using short codes and distance estimators  H. Jegou, M. Douze, C. Schmid  INRIA Technical Report  We propose an approximate nearest neighbor search method based on quantization. It uses, in particular, product quantizer to produce short codes and corresponding distance estimators approximating the Euclidean distance between the orginal vectors. The method is advantageously used in an asymmetric manner, by computing the distance between a vector and code, unlike competing techniques such as spectral hashing that only compare codes. Our approach approximates the Euclidean distance based on memory efficient codes and, thus, permits efficient nearest neighbor search. Experiments performed on SIFT and GIST image descriptors show excellent search accuracy. The method is shown to outperform two stateoftheart approaches of the literature. Timings measured when searching a vector set of 2 billion vectors are shown to be excellent given the high accuracy of the method. 
2009  Spectral Hashing  Y. Weiss, A. Torralba, R. Fergus  NIPS  Semantic hashing seeks compact binary codes of datapoints so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard. By relaxing the original problem, we obtain a spectral method whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian. By utilizing recent results on convergence of graph Laplacian eigenvectors to the LaplaceBeltrami eigenfunctions of manifolds, we show how to efficiently calculate the code of a novel datapoint. Taken together, both learning the code and applying it to a novel point are extremely simple. Our experiments show that our codes outperform the stateofthe art. 
2008  The MIR Flickr Retrieval Evaluation.  M. J. Huiskes, M. S. Lew  MIR  In most well known image retrieval test sets, the imagery typically cannot be freely distributed or is not representative of a large community of users. In this paper we present a collection for the MIR community comprising 25000 images from the Flickr website which are redistributable for research purposes and represent a real community of users both in the image content and image tags. We have extracted the tags and EXIF image metadata, and also make all of these publicly available. In addition we discuss several challenges for benchmarking retrieval and classification methods. 
2008  80 million tiny images: a large dataset for nonparametric object and scene recognition  A. Torralba, R. Fergus and W. Freeman  TPAMI  With the advent of the Internet, billions of images are now freely available online and constitute a dense sampling of the visual world. Using a variety of nonparametric methods, we explore this world with the aid of a large dataset of 79,302,017 images collected from the Web. Motivated by psychophysical results showing the remarkable tolerance of the human visual system to degradations in image resolution, the images in the dataset are stored as 32 × 32 color images. Each image is loosely labeled with one of the 75,062 nonabstract nouns in English, as listed in the Wordnet lexical database. Hence the image database gives a comprehensive coverage of all object categories and scenes. The semantic information from Wordnet can be used in conjunction with nearestneighbor methods to perform object classification over a range of semantic levels minimizing the effects of labeling noise. For certain classes that are particularly prevalent in the dataset, such as people, we are able to demonstrate a recognition performance comparable to classspecific ViolaJones style detectors. 
2007  LabelMe: a database and webbased tool for image annotation  B. Russell, A. Torralba, K. Murphy, W. T. Freeman  IJCV  We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a webbased tool that allows easy image annotation and instant sharing of such annotations. Using this annotation tool, we have collected a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of the dataset and compare against existing state of the art datasets used for object recognition and detection. Also, we show how to extend the dataset to automatically enhance object labels with WordNet, discover object parts, recover a depth ordering of objects in a scene, and increase the number of labels using minimal user supervision and images from the web. 
2007  MultiProbe LSH: Efficient Indexing for HighDimensional Similarity Search  Q. Lv, W. Josephson, Z. Wang, M. Charikar, K. Li  VLDB  Similarity indices for highdimensional data are very desirable for building contentbased search systems for featurerich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been proposed as indexing techniques for approximate similarity search. A significant drawback of these approaches is the requirement for a large number of hash tables in order to achieve good search quality. This paper proposes a new indexing scheme called multiprobe LSH that overcomes this drawback. Multiprobe LSH is built on the wellknown LSH technique, but it intelligently probes multiple buckets that are likely to contain query results in a hash table. Our method is inspired by and improves upon recent theoretical work on entropybased LSH designed to reduce the space requirement of the basic LSH method. We have implemented the multiprobe LSH method and evaluated the implementation with two different highdimensional datasets. Our evaluation shows that the multiprobe LSH method substantially improves upon previously proposed methods in both space and time efficiency. To achieve the same search quality, multiprobe LSH has a similar timeefficiency as the basic LSH method while reducing the number of hash tables by an order of magnitude. In comparison with the entropybased LSH method, to achieve the same search quality, multiprobe LSH uses less query time and 5 to 8 times fewer number of hash tables. 
2007  Semantic Hashing  R. Salakhutdinov, G. Hinton  NIPS  We show how to learn a deep graphical model of the wordcount vectors obtained from a large set of documents. The values of the latent variables in the deepest layer are easy to infer and give a much better representation of each document than Latent Semantic Analysis. When the deepest layer is forced to use a small number of binary variables (e.g. 32), the graphical model performs “semantic hashing”: Documents are mapped to memory addresses in such a way that semantically similar documents are located at nearby addresses. Documents similar to a query document can then be found by simply accessing all the addresses that differ by only a few bits from the address of the query document. This way of extending the efficiency of hashcoding to approximate matching is much faster than locality sensitive hashing, which is the fastest current method. By using semantic hashing to filter the documents given to TFIDF, we achieve higher accuracy than applying TFIDF to the entire document set. 
2006  NearOptimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions  A. Andoni, P. Indyk  FOCS  We present an algorithm for the capproximate nearest neighbor problem in a ddimensional Euclidean space, achieving query time of O(dn 1c2/+o(1)) and space O(dn + n1+1c2/+o(1)). This almost matches the lower bound for hashingbased algorithm recently obtained in (R. Motwani et al., 2006). We also obtain a spaceefficient version of the algorithm, which uses dn+n logO(1) n space, with a query time of dnO(1/c2). Finally, we discuss practical variants of the algorithms that utilize fast boundeddistance decoders for the Leech lattice 
2005  LSH Forest: SelfTuning Indexes for Similarity Search  M. Bawa, T. Condie, P. Ganesan  WWW  We consider the problem of indexing highdimensional data for answering (approximate) similaritysearch queries. Similarity indexes prove to be important in a wide variety of settings: Web search engines desire fast, parallel, mainmemorybased indexes for similarity search on text data; database systems desire diskbased similarity indexes for highdimensional data, including text and images; peertopeer systems desire distributed similarity indexes with low communication cost. We propose an indexing scheme called LSH Forest which is applicable in all the above contexts. Our index uses the wellknown technique of localitysensitive hashing (LSH), but improves upon previous designs by (a) eliminating the different datadependent parameters for which LSH must be constantly handtuned, and (b) improving on LSH’s performance guarantees for skewed data distributions while retaining the same storage and query overhead. We show how to construct this index in main memory, on disk, in parallel systems, and in peertopeer systems. We evaluate the design with experiments on multiple text corpora and demonstrate both the selftuning nature and the superior performance of LSH Forest. 
2004  Localitysensitive hashing scheme based on pstable distributions  M. Datar, N. Immorlica, P. Indyk, V. Mirrokni  SCG  We present a novel LocalitySensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on pstable distributions.Our scheme improves the running time of the earlier algorithm for the case of the lp norm. It also yields the first known provably efficient approximate NN algorithm for the case p<1. We also show that the algorithm finds the exact near neigbhor in O(log n) time for data satisfying certain “bounded growth” condition.Unlike earlier schemes, our LSH scheme works directly on points in the Euclidean space without embeddings. Consequently, the resulting query time bound is free of large factors and is simple and easy to implement. Our experiments (on synthetic data sets) show that the our data structure is up to 40 times faster than kdtree. 
1999  Similarity Search in High Dimensions via Hashing  A. Gionis, P. Indyk, R. Motwani  VLDB  The nearest or nearneighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building search/index structures for performing similarity search over highdimensional data, e.g., image databases, document collections, timeseries databases, and genome databases. Unfortunately, all known techniques for solving this problem fall prey to the curse of dimensionality. That is, the data structures scale poorly with data dimensionality; in fact, if the number of dimensions exceeds 10 to 20, searching in kd trees and related structures involves the inspection of a large fraction of the database, thereby doing no better than bruteforce linear search. It has been suggested that since the selection of features and the choice of a distance metric in typical applications is rather heuristic, determining an approximate nearest neighbor should suffice for most practical purposes. In this paper, we examine a novel scheme for approximate similarity search based on hashing. The basic idea is to hash the points from the database so as to ensure that the probability of collision is much higher for objects that are close to each other than for those that are far apart. We provide experimental evidence that our method gives significant improvement in running time over other methods for searching in highdimensional spaces based on hierarchical tree decomposition. Experimental results also indicate that our scheme scales well even for a relatively large number of dimensions (more than 50). 
1999  The MNIST Database of Handwritten Digits  Y. LeCun, C. Cortes, C. Burges  The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been sizenormalized and centered in a fixedsize image. It is a good database for people who want to try learning techniques and pattern recognition methods on realworld data while spending minimal efforts on preprocessing and formatting. 