Search across all paper titles, abstracts, authors by using the search field. Please consider contributing by updating the information of existing papers or adding new work.
Year | Title | Authors | Venue | Abstract |
---|---|---|---|---|
2024 | Neurohash A Hyperdimensional Neuro-symbolic Framework For Spatially-aware Image Hashing And Retrieval | Yun Sanggeon, Masukawa Ryozo, Jeong Sungheon, Imani Mohsen | Arxiv | Customizable image retrieval from large datasets remains a critical challenge particularly when preserving spatial relationships within images. Traditional hashing methods primarily based on deep learning often fail to capture spatial information adequately and lack transparency. In this paper we introduce NeuroHash a novel neuro-symbolic framework leveraging Hyperdimensional Computing (HDC) to enable highly customizable spatially-aware image retrieval. NeuroHash combines pre-trained deep neural network models with HDC-based symbolic models allowing for flexible manipulation of hash values to support conditional image retrieval. Our method includes a self-supervised context-aware HDC encoder and novel loss terms for optimizing lower-dimensional bipolar hashing using multilinear hyperplanes. We evaluate NeuroHash on two benchmark datasets demonstrating superior performance compared to state-of-the-art hashing methods as measured by mAP@5K scores and our newly introduced metric mAP@5Kr which assesses spatial alignment. The results highlight NeuroHashs ability to achieve competitive performance while offering significant advantages in flexibility and customization paving the way for more advanced and versatile image retrieval systems. |
2024 | Contrastive Masked Auto-encoders Based Self-supervised Hashing For 2D Image And 3D Point Cloud Cross-modal Retrieval | Wei Rukai, Cui Heng, Liu Yu, Hou Yufeng, Xie Yanzhao, Zhou Ke | Arxiv | Implementing cross-modal hashing between 2D images and 3D point-cloud data is a growing concern in real-world retrieval systems. Simply applying existing cross-modal approaches to this new task fails to adequately capture latent multi-modal semantics and effectively bridge the modality gap between 2D and 3D. To address these issues without relying on hand-crafted labels we propose contrastive masked autoencoders based self-supervised hashing (CMAH) for retrieval between images and point-cloud data. We start by contrasting 2D-3D pairs and explicitly constraining them into a joint Hamming space. This contrastive learning process ensures robust discriminability for the generated hash codes and effectively reduces the modality gap. Moreover we utilize multi-modal auto-encoders to enhance the models understanding of multi-modal semantics. By completing the masked image/point-cloud data modeling task the model is encouraged to capture more localized clues. In addition the proposed multi-modal fusion block facilitates fine-grained interactions among different modalities. Extensive experiments on three public datasets demonstrate that the proposed CMAH significantly outperforms all baseline methods. |
2024 | Neural Locality Sensitive Hashing For Entity Blocking | Wang Runhui, Kong Luyang, Tao Yefan, Borthwick Andrew, Golac Davor, Johnson Henrik, Hijazi Shadie, Deng Dong, Zhang Yongfeng | Arxiv | Locality-sensitive hashing (LSH) is a fundamental algorithmic technique widely employed in large-scale data processing applications such as nearest-neighbor search entity resolution and clustering. However its applicability in some real-world scenarios is limited due to the need for careful design of hashing functions that align with specific metrics. Existing LSH-based Entity Blocking solutions primarily rely on generic similarity metrics such as Jaccard similarity whereas practical use cases often demand complex and customized similarity rules surpassing the capabilities of generic similarity metrics. Consequently designing LSH functions for these customized similarity rules presents considerable challenges. In this research we propose a neuralization approach to enhance locality-sensitive hashing by training deep neural networks to serve as hashing functions for complex metrics. We assess the effectiveness of this approach within the context of the entity resolution problem which frequently involves the use of task-specific metrics in real-world applications. Specifically we introduce NLSHBlock (Neural-LSH Block) a novel blocking methodology that leverages pre-trained language models fine-tuned with a novel LSH-based loss function. Through extensive evaluations conducted on a diverse range of real-world datasets we demonstrate the superiority of NLSHBlock over existing methods exhibiting significant performance improvements. Furthermore we showcase the efficacy of NLSHBlock in enhancing the performance of the entity matching phase particularly within the semi-supervised setting. |
2024 | RREH Reconstruction Relations Embedded Hashing For Semi-paired Cross-modal Retrieval | Wang Jianzong, Shi Haoxiang, Luo Kaiyi, Zhang Xulong, Cheng Ning, Xiao Jing | Arxiv | Known for efficient computation and easy storage hashing has been extensively explored in cross-modal retrieval. The majority of current hashing models are predicated on the premise of a direct one-to-one mapping between data points. However in real practice data correspondence across modalities may be partially provided. In this research we introduce an innovative unsupervised hashing technique designed for semi-paired cross-modal retrieval tasks named Reconstruction Relations Embedded Hashing (RREH). RREH assumes that multi-modal data share a common subspace. For paired data RREH explores the latent consistent information of heterogeneous modalities by seeking a shared representation. For unpaired data to effectively capture the latent discriminative features the high-order relationships between unpaired data and anchors are embedded into the latent subspace which are computed by efficient linear reconstruction. The anchors are sampled from paired data which improves the efficiency of hash learning. The RREH trains the underlying features and the binary encodings in a unified framework with high-order reconstruction relations preserved. With the well devised objective function and discrete optimization algorithm RREH is designed to be scalable making it suitable for large-scale datasets and facilitating efficient cross-modal retrieval. In the evaluation process the proposed is tested with partially paired data to establish its superiority over several existing methods. |
2024 | BlockBoost: Scalable and Efficient Blocking through Boosting | Thiago Ramos, Rodrigo Loro Schuller, Alex Akira Okuno, Lucas Nissenbaum, Roberto I Oliveira, Paulo Orenstein | AISTATS | As datasets grow larger, matching and merging entries from different databases has become a costly task in modern data pipelines. To avoid expensive comparisons between entries, blocking similar items is a popular preprocessing step. In this paper, we introduce BlockBoost, a novel boosting-based method that generates compact binary hash codes for database entries, through which blocking can be performed efficiently. The algorithm is fast and scalable, resulting in computational costs that are orders of magnitude lower than current benchmarks. Unlike existing alternatives, BlockBoost comes with associated feature importance measures for interpretability, and possesses strong theoretical guarantees, including lower bounds on critical performance metrics like recall and reduction ratio. Finally, we show that BlockBoost delivers great empirical results, outperforming state-of-the-art blocking benchmarks in terms of both performance metrics and computational cost. |
2024 | Hashing Geographical Point Data Using The Space-filling H-curve | Netay Igor V. | Arxiv | We construct geohashing procedure based on using of space-filling H-curve. This curve provides a way to construct geohash with less computations than the construction based on usage of Hilbert curve. At the same time H-curve has better clustering properties. |
2024 | Concepthash Interpretable Fine-grained Hashing Via Concept Discovery | Ng Kam Woh, Zhu Xiatian, Song Yi-zhe, Xiang Tao | Arxiv | Existing fine-grained hashing methods typically lack code interpretability as they compute hash code bits holistically using both global and local features. To address this limitation we propose ConceptHash a novel method that achieves sub-code level interpretability. In ConceptHash each sub-code corresponds to a human-understandable concept such as an object part and these concepts are automatically discovered without human annotations. Specifically we leverage a Vision Transformer architecture and introduce concept tokens as visual prompts along with image patch tokens as model inputs. Each concept is then mapped to a specific sub-code at the model output providing natural sub-code interpretability. To capture subtle visual differences among highly similar sub-categories (e.g. bird species) we incorporate language guidance to ensure that the learned hash codes are distinguishable within fine-grained object classes while maintaining semantic alignment. This approach allows us to develop hash codes that exhibit similarity within families of species while remaining distinct from species in other families. Extensive experiments on four fine-grained image retrieval benchmarks demonstrate that ConceptHash outperforms previous methods by a significant margin offering unique sub-code interpretability as an additional benefit. Code at https://github.com/kamwoh/concepthash.” |
2024 | Fliphash A Constant-time Consistent Range-hashing Algorithm | Masson Charles, Lee Homin K. | Arxiv | Consistent range-hashing is a technique used in distributed systems either directly or as a subroutine for consistent hashing commonly to realize an even and stable data distribution over a variable number of resources. We introduce FlipHash a consistent range-hashing algorithm with constant time complexity and low memory requirements. Like Jump Consistent Hash FlipHash is intended for applications where resources can be indexed sequentially. Under this condition it ensures that keys are hashed evenly across resources and that changing the number of resources only causes keys to be remapped from a removed resource or to an added one but never shuffled across persisted ones. FlipHash differentiates itself with its low computational cost achieving constant-time complexity. We show that FlipHash beats Jump Consistent Hashs cost which is logarithmic in the number of resources both theoretically and in experiments over practical settings. |
2024 | HARR: Learning Discriminative and High-quality Hash Codes for Image Retrieval | Zeyu Ma, Siwei Wang, Xiao Luo, Zhonghui Gu, Chong Chen, Jinxing Li, Xian-Sheng Hua, Guangming Lu | TOMM | This article studies deep unsupervised hashing, which has attracted increasing attention in large-scale image retrieval. The majority of recent approaches usually reconstruct semantic similarity information, which then guides the hash code learning. However, they still fail to achieve satisfactory performance in reality for two reasons. On the one hand, without accurate supervised information, these methods usually fail to produce independent and robust hash codes with semantics information well preserved, which may hinder effective image retrieval. On the other hand, due to discrete constraints, how to effectively optimize the hashing network in an end-to-end manner with small quantization errors remains a problem. To address these difficulties, we propose a novel unsupervised hashing method called HARR to learn discriminative and high-quality hash codes. To comprehensively explore semantic similarity structure, HARR adopts the Winner-Take-All hash to model the similarity structure. Then similarity-preserving hash codes are learned under the reliable guidance of the reconstructed similarity structure. Additionally, we improve the quality of hash codes by a bit correlation reduction module, which forces the cross-correlation matrix between a batch of hash codes under different augmentations to approach the identity matrix. In this way, the generated hash bits are expected to be invariant to disturbances with minimal redundancy, which can be further interpreted as an instantiation of the information bottleneck principle. Finally, for effective hashing network training, we minimize the cosine distances between real-value network outputs and their binary codes for small quantization errors. Extensive experiments demonstrate the effectiveness of our proposed HARR. |
2024 | Distilling Vision-language Pretraining For Efficient Cross-modal Retrieval | Jang Young Kyun, Kim Donghyun, Lim Ser-nam | Arxiv | Learning to hash is a practical solution for efficient retrieval offering fast search speed and low storage cost. It is widely applied in various applications such as image-text cross-modal search. In this paper we explore the potential of enhancing the performance of learning to hash with the proliferation of powerful large pre-trained models such as Vision-Language Pre-training (VLP) models. We introduce a novel method named Distillation for Cross-Modal Quantization (DCMQ) which leverages the rich semantic knowledge of VLP models to improve hash representation learning. Specifically we use the VLP as a teacher to distill knowledge into a student hashing model equipped with codebooks. This process involves the replacement of supervised labels which are composed of multi-hot vectors and lack semantics with the rich semantics of VLP. In the end we apply a transformation termed Normalization with Paired Consistency (NPC) to achieve a discriminative target for distillation. Further we introduce a new quantization method Product Quantization with Gumbel (PQG) that promotes balanced codebook learning thereby improving the retrieval performance. Extensive benchmark testing demonstrates that DCMQ consistently outperforms existing supervised cross-modal hashing approaches showcasing its significant potential. |
2024 | On The Adversarial Robustness Of Locality-sensitive Hashing In Hamming Space | Kapralov Michael, Makarov Mikhail, Sohler Christian | Arxiv | Locality-sensitive hashing~IndykMotwani98 is a classical data structure for approximate nearest neighbor search. It allows after a close to linear time preprocessing of the input dataset to find an approximately nearest neighbor of any fixed query in sublinear time in the dataset size. The resulting data structure is randomized and succeeds with high probability for every fixed query. In many modern applications of nearest neighbor search the queries are chosen adaptively. In this paper we study the robustness of the locality-sensitive hashing to adaptive queries in Hamming space. We present a simple adversary that can under mild assumptions on the initial point set provably find a query to the approximate near neighbor search data structure that the data structure fails on. Crucially our adaptive algorithm finds the hard query exponentially faster than random sampling. |
2024 | Compact Parallel Hash Tables On The GPU | Hegeman Steef, Wöltgens Daan, Wijs Anton, Laarman Alfons | Arxiv | On the GPU hash table operation speed is determined in large part by cache line efficiency and state-of-the-art hashing schemes thus divide tables into cache line-sized buckets. This raises the question whether performance can be further improved by increasing the number of entries that fit in such buckets. Known compact hashing techniques have not yet been adapted to the massively parallel setting nor have they been evaluated on the GPU. We consider a compact version of bucketed cuckoo hashing and a version of compact iceberg hashing suitable for the GPU. We discuss the tables from a theoretical perspective and provide an open source implementation of both schemes in CUDA for comparative benchmarking. In terms of performance the state-of-the-art cuckoo hashing benefits from compactness on lookups and insertions (most experiments show at least 10-2037; increase in throughput) and the iceberg table benefits significantly to the point of being comparable to compact cuckoo hashing–while supporting performant dynamic operation. |
2024 | Hybridhash Hybrid Convolutional And Self-attention Deep Hashing For Image Retrieval | He Chao, Wei Hongxi | Arxiv | Deep image hashing aims to map input images into simple binary hash codes via deep neural networks and thus enable effective large-scale image retrieval. Recently hybrid networks that combine convolution and Transformer have achieved superior performance on various computer tasks and have attracted extensive attention from researchers. Nevertheless the potential benefits of such hybrid networks in image retrieval still need to be verified. To this end we propose a hybrid convolutional and self-attention deep hashing method known as HybridHash. Specifically we propose a backbone network with stage-wise architecture in which the block aggregation function is introduced to achieve the effect of local self-attention and reduce the computational complexity. The interaction module has been elaborately designed to promote the communication of information between image blocks and to enhance the visual representations. We have conducted comprehensive experiments on three widely used datasets CIFAR-10 NUS-WIDE and IMAGENET. The experimental results demonstrate that the method proposed in this paper has superior performance with respect to state-of-the-art deep hashing methods. Source code is available https://github.com/shuaichaochao/HybridHash.” |
2024 | Bit-mask Robust Contrastive Knowledge Distillation For Unsupervised Semantic Hashing | He Liyang, Huang Zhenya, Liu Jiayu, Chen Enhong, Wang Fei, Sha Jing, Wang Shijin | Arxiv | Unsupervised semantic hashing has emerged as an indispensable technique for fast image search which aims to convert images into binary hash codes without relying on labels. Recent advancements in the field demonstrate that employing large-scale backbones (e.g. ViT) in unsupervised semantic hashing models can yield substantial improvements. However the inference delay has become increasingly difficult to overlook. Knowledge distillation provides a means for practical model compression to alleviate this delay. Nevertheless the prevailing knowledge distillation approaches are not explicitly designed for semantic hashing. They ignore the unique search paradigm of semantic hashing the inherent necessities of the distillation process and the property of hash codes. In this paper we propose an innovative Bit-mask Robust Contrastive knowledge Distillation (BRCD) method specifically devised for the distillation of semantic hashing models. To ensure the effectiveness of two kinds of search paradigms in the context of semantic hashing BRCD first aligns the semantic spaces between the teacher and student models through a contrastive knowledge distillation objective. Additionally to eliminate noisy augmentations and ensure robust optimization a cluster-based method within the knowledge distillation process is introduced. Furthermore through a bit-level analysis we uncover the presence of redundancy bits resulting from the bit independence property. To mitigate these effects we introduce a bit mask mechanism in our knowledge distillation objective. Finally extensive experiments not only showcase the noteworthy performance of our BRCD method in comparison to other knowledge distillation methods but also substantiate the generality of our methods across diverse semantic hashing models and backbones. The code for BRCD is available at https://github.com/hly1998/BRCD.” |
2024 | Hashing Based Contrastive Learning For Virtual Screening | Han Jin, Hong Yun, Li Wu-jun | Arxiv | Virtual screening (VS) is a critical step in computer-aided drug discovery aiming to identify molecules that bind to a specific target receptor like protein. Traditional VS methods such as docking are often too time-consuming for screening large-scale molecular databases. Recent advances in deep learning have demonstrated that learning vector representations for both proteins and molecules using contrastive learning can outperform traditional docking methods. However given that target databases often contain billions of molecules real-valued vector representations adopted by existing methods can still incur significant memory and time costs in VS. To address this problem in this paper we propose a hashing-based contrastive learning method called DrugHash for VS. DrugHash treats VS as a retrieval task that uses efficient binary hash codes for retrieval. In particular DrugHash designs a simple yet effective hashing strategy to enable end-to-end learning of binary hash codes for both protein and molecule modalities which can dramatically reduce the memory and time costs with higher accuracy compared with existing methods. Experimental results show that DrugHash can outperform existing methods to achieve state-of-the-art accuracy with a memory saving of 32(times) and a speed improvement of 3.5(times). |
2024 | Transformer-based Clipped Contrastive Quantization Learning For Unsupervised Image Retrieval | Dubey Ayush, Dubey Shiv Ram, Singh Satish Kumar, Chu Wei-ta | Arxiv | Unsupervised image retrieval aims to learn the important visual characteristics without any given level to retrieve the similar images for a given query image. The Convolutional Neural Network (CNN)-based approaches have been extensively exploited with self-supervised contrastive learning for image hashing. However the existing approaches suffer due to lack of effective utilization of global features by CNNs and biased-ness created by false negative pairs in the contrastive learning. In this paper we propose a TransClippedCLR model by encoding the global context of an image using Transformer having local context through patch based processing by generating the hash codes through product quantization and by avoiding the potential false negative pairs through clipped contrastive learning. The proposed model is tested with superior performance for unsupervised image retrieval on benchmark datasets including CIFAR10 NUS-Wide and Flickr25K as compared to the recent state-of-the-art deep models. The results using the proposed clipped contrastive learning are greatly improved on all datasets as compared to same backbone network with vanilla contrastive learning. |
2024 | Binomialhash A Constant Time Minimal Memory Consistent Hash Algorithm | Coluzzi Massimo, Brocco Amos, Antonucci Alessandro | Arxiv | Consistent hashing is employed in distributed systems and networking applications to evenly and effectively distribute data across a cluster of nodes. This paper introduces BinomialHash a consistent hashing algorithm that operates in constant time and requires minimal memory. We provide a detailed explanation of the algorithm offer a pseudo-code implementation and formally establish its strong theoretical guarantees. |
2024 | Supervised Consensus Anchor Graph Hashing for Cross Modal Retrieval | Rui Chen, Hongbin Wang | TIP | The target of cross-modal hashing is to embed heterogeneous multimedia data into a common low-dimensional Hamming space, which plays a pivotal part in multimedia retrieval due to the emergence of big multimodal data. Recently, matrix factorization has achieved great success in cross-modal hashing. However, how to effectively use label information and local geometric structure is still a challenging problem for these approaches. To address this issue, we propose a cross-modal hashing method based on collective matrix factorization, which considers both the label consistency across different modalities and the local geometric consistency in each modality. These two elements are formulated as a graph Laplacian term in the objective function, leading to a substantial improvement on the discriminative power of latent semantic features obtained by collective matrix factorization. Moreover, the proposed method learns unified hash codes for different modalities of an instance to facilitate cross-modal search, and the objective function is solved using an iterative strategy. The experimental results on two benchmark data sets show the effectiveness of the proposed method and its superiority over state-of-the-art cross-modal hashing methods. |
2024 | Towards Effective Top-n Hamming Search Via Bipartite Graph Contrastive Hashing | Chen Yankai, Fang Yixiang, Zhang Yifei, Ma Chenhao, Hong Yang, King Irwin | Arxiv | Searching on bipartite graphs serves as a fundamental task for various real-world applications such as recommendation systems database retrieval and document querying. Conventional approaches rely on similarity matching in continuous Euclidean space of vectorized node embeddings. To handle intensive similarity computation efficiently hashing techniques for graph-structured data have emerged as a prominent research direction. However despite the retrieval efficiency in Hamming space previous studies have encountered catastrophic performance decay. To address this challenge we investigate the problem of hashing with Graph Convolutional Network for effective Top-N search. Our findings indicate the learning effectiveness of incorporating hashing techniques within the exploration of bipartite graph reception fields as opposed to simply treating hashing as post-processing to output embeddings. To further enhance the model performance we advance upon these findings and propose Bipartite Graph Contrastive Hashing (BGCH+). BGCH+ introduces a novel dual augmentation approach to both intermediate information and hash code outputs in the latent feature spaces thereby producing more expressive and robust hash codes within a dual self-supervised learning paradigm. Comprehensive empirical analyses on six real-world benchmarks validate the effectiveness of our dual feature contrastive learning in boosting the performance of BGCH+ compared to existing approaches. |
2024 | Leveraging High-resolution Features For Improved Deep Hashing-based Image Retrieval | Berriche Aymene, Zakaria Mehdi Adjal, Baghdadi Riyadh | Arxiv | Deep hashing techniques have emerged as the predominant approach for efficient image retrieval. Traditionally these methods utilize pre-trained convolutional neural networks (CNNs) such as AlexNet and VGG-16 as feature extractors. However the increasing complexity of datasets poses challenges for these backbone architectures in capturing meaningful features essential for effective image retrieval. In this study we explore the efficacy of employing high-resolution features learned through state-of-the-art techniques for image retrieval tasks. Specifically we propose a novel methodology that utilizes High-Resolution Networks (HRNets) as the backbone for the deep hashing task termed High-Resolution Hashing Network (HHNet). Our approach demonstrates superior performance compared to existing methods across all tested benchmark datasets including CIFAR-10 NUS-WIDE MS COCO and ImageNet. This performance improvement is more pronounced for complex datasets which highlights the need to learn high-resolution features for intricate image retrieval tasks. Furthermore we conduct a comprehensive analysis of different HRNet configurations and provide insights into the optimal architecture for the deep hashing task |
2024 | Fast Redescription Mining Using Locality-sensitive Hashing | Karjalainen Maiju, Galbrun Esther, Miettinen Pauli | Arxiv | Redescription mining is a data analysis technique that has found applications in diverse fields. The most used redescription mining approaches involve two phases finding matching pairs among data attributes and extending the pairs. This process is relatively efficient when the number of attributes remains limited and when the attributes are Boolean but becomes almost intractable when the data consist of many numerical attributes. In this paper we present new algorithms that perform the matching and extension orders of magnitude faster than the existing approaches. Our algorithms are based on locality-sensitive hashing with a tailored approach to handle the discretisation of numerical attributes as used in redescription mining. |
2023 | Unsupervised Multi-criteria Adversarial Detection In Deep Image Retrieval | Xiao Yanru, Wang Cong, Gao Xing | Arxiv | The vulnerability in the algorithm supply chain of deep learning has imposed new challenges to image retrieval systems in the downstream. Among a variety of techniques deep hashing is gaining popularity. As it inherits the algorithmic backend from deep learning a handful of attacks are recently proposed to disrupt normal image retrieval. Unfortunately the defense strategies in softmax classification are not readily available to be applied in the image retrieval domain. In this paper we propose an efficient and unsupervised scheme to identify unique adversarial behaviors in the hamming space. In particular we design three criteria from the perspectives of hamming distance quantization loss and denoising to defend against both untargeted and targeted attacks which collectively limit the adversarial space. The extensive experiments on four datasets demonstrate 2-2337; improvements of detection rates with minimum computational overhead for real-time image queries. |
2023 | A Study On The Use Of Perceptual Hashing To Detect Manipulation Of Embedded Messages In Images | Wöhnert Sven-jannik, Wöhnert Kai Hendrik, Almamedov Eldar, Frank Carsten, Skwarek Volker | Arxiv | Typically metadata of images are stored in a specific data segment of the image file. However to securely detect changes data can also be embedded within images. This follows the goal to invisibly and robustly embed as much information as possible to ideally even survive compression. This work searches for embedding principles which allow to distinguish between unintended changes by lossy image compression and malicious manipulation of the embedded message based on the change of its perceptual or robust hash. Different embedding and compression algorithms are compared. The study shows that embedding a message via integer wavelet transform and compression with Karhunen-Loeve-transform yields the best results. However it was not possible to distinguish between manipulation and compression in all cases. |
2023 | Constant Sequence Extension For Fast Search Using Weighted Hamming Distance | Weng Zhenyu, Zhuang Huiping, Li Haizhou, Lin Zhiping | Arxiv | Representing visual data using compact binary codes is attracting increasing attention as binary codes are used as direct indices into hash table(s) for fast non-exhaustive search. Recent methods show that ranking binary codes using weighted Hamming distance (WHD) rather than Hamming distance (HD) by generating query-adaptive weights for each bit can better retrieve query-related items. However search using WHD is slower than that using HD. One main challenge is that the complexity of extending a monotone increasing sequence using WHD to probe buckets in hash table(s) for existing methods is at least proportional to the square of the sequence length while that using HD is proportional to the sequence length. To overcome this challenge we propose a novel fast non-exhaustive search method using WHD. The key idea is to design a constant sequence extension algorithm to perform each sequence extension in constant computational complexity and the total complexity is proportional to the sequence length which is justified by theoretical analysis. Experimental results show that our method is faster than other WHD-based search methods. Also compared with the HD-based non-exhaustive search method our method has comparable efficiency but retrieves more query-related items for the dataset of up to one billion items. |
2023 | Semantic-aware Adversarial Training For Reliable Deep Hashing Retrieval | Yuan Xu, Zhang Zheng, Wang Xunguang, Wu Lin | in IEEE Transactions on Information Forensics and Security vol. | Deep hashing has been intensively studied and successfully applied in large-scale image retrieval systems due to its efficiency and effectiveness. Recent studies have recognized that the existence of adversarial examples poses a security threat to deep hashing models that is adversarial vulnerability. Notably it is challenging to efficiently distill reliable semantic representatives for deep hashing to guide adversarial learning and thereby it hinders the enhancement of adversarial robustness of deep hashing-based retrieval models. Moreover current researches on adversarial training for deep hashing are hard to be formalized into a unified minimax structure. In this paper we explore Semantic-Aware Adversarial Training (SAAT) for improving the adversarial robustness of deep hashing models. Specifically we conceive a discriminative mainstay features learning (DMFL) scheme to construct semantic representatives for guiding adversarial learning in deep hashing. Particularly our DMFL with the strict theoretical guarantee is adaptively optimized in a discriminative learning manner where both discriminative and semantic properties are jointly considered. Moreover adversarial examples are fabricated by maximizing the Hamming distance between the hash codes of adversarial samples and mainstay features the efficacy of which is validated in the adversarial attack trials. Further we for the first time formulate the formalized adversarial training of deep hashing into a unified minimax optimization under the guidance of the generated mainstay codes. Extensive experiments on benchmark datasets show superb attack performance against the state-of-the-art algorithms meanwhile the proposed adversarial training can effectively eliminate adversarial perturbations for trustworthy deep hashing-based retrieval. Our code is available at https://github.com/xandery-geek/SAAT.” |
2023 | Cascading Hierarchical Networks With Multi-task Balanced Loss For Fine-grained Hashing | Zeng Xianxian, Zheng Yanjun | Arxiv | With the explosive growth in the number of fine-grained images in the Internet era it has become a challenging problem to perform fast and efficient retrieval from large-scale fine-grained images. Among the many retrieval methods hashing methods are widely used due to their high efficiency and small storage space occupation. Fine-grained hashing is more challenging than traditional hashing problems due to the difficulties such as low inter-class variances and high intra-class variances caused by the characteristics of fine-grained images. To improve the retrieval accuracy of fine-grained hashing we propose a cascaded network to learn compact and highly semantic hash codes and introduce an attention-guided data augmentation method. We refer to this network as a cascaded hierarchical data augmentation network. We also propose a novel approach to coordinately balance the loss of multi-task learning. We do extensive experiments on some common fine-grained visual classification datasets. The experimental results demonstrate that our proposed method outperforms several state-of-art hashing methods and can effectively improve the accuracy of fine-grained retrieval. The source code is publicly available https://github.com/kaiba007/FG-CNET.” |
2023 | Attribute-aware Deep Hashing With Self-consistency For Large-scale Fine-grained Image Retrieval | Wei Xiu-shen, Shen Yang, Sun Xuhao, Wang Peng, Peng Yuxin | Arxiv | Our work focuses on tackling large-scale fine-grained image retrieval as ranking the images depicting the concept of interests (i.e. the same sub-category labels) highest based on the fine-grained details in the query. It is desirable to alleviate the challenges of both fine-grained nature of small inter-class variations with large intra-class variations and explosive growth of fine-grained data for such a practical task. In this paper we propose attribute-aware hashing networks with self-consistency for generating attribute-aware hash codes to not only make the retrieval process efficient but also establish explicit correspondences between hash codes and visual attributes. Specifically based on the captured visual representations by attention we develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors from the appearance-specific visual representations without attribute annotations. Our models are also equipped with a feature decorrelation constraint upon these attribute vectors to strengthen their representative abilities. Then driven by preserving original entities similarity the required hash codes can be generated from these attribute-specific vectors and thus become attribute-aware. Furthermore to combat simplicity bias in deep hashing we consider the model design from the perspective of the self-consistency principle and propose to further enhance models self-consistency by equipping an additional image reconstruction path. Comprehensive quantitative experiments under diverse empirical settings on six fine-grained retrieval datasets and two generic retrieval datasets show the superiority of our models over competing methods. |
2023 | Uncertainty-aware Unsupervised Video Hashing | Yucheng Wang, Mingyuan Zhou, Yu Sun, Xiaoning Qian | AISTATS | Learning to hash has become popular for video retrieval due to its fast speed and low storage consumption. Previous efforts formulate video hashing as training a binary auto-encoder, for which noncontinuous latent representations are optimized by the biased straight-through (ST) back-propagation heuristic. We propose to formulate video hashing as learning a discrete variational auto-encoder with the factorized Bernoulli latent distribution, termed as Bernoulli variational auto-encoder (BerVAE). The corresponding evidence lower bound (ELBO) in our BerVAE implementation leads to closed-form gradient expression, which can be applied to achieve principled training along with some other unbiased gradient estimators. BerVAE enables uncertainty-aware video hashing by predicting the probability distribution of video hash code-words, thus providing reliable uncertainty quantification. Experiments on both simulated and real-world large-scale video data demonstrate that our BerVAE trained with unbiased gradient estimators can achieve the state-of-the-art retrieval performance. Furthermore, we show that quantified uncertainty is highly correlated to video retrieval performance, which can be leveraged to further improve the retrieval accuracy. Our code is available at https://github.com/wangyucheng1234/BerVAE |
2023 | Reliable And Efficient Evaluation Of Adversarial Robustness For Deep Hashing-based Retrieval | Wang Xunguang, Bai Jiawang, Xu Xinyue, Li Xiaomeng | Arxiv | Deep hashing has been extensively applied to massive image retrieval due to its efficiency and effectiveness. Recently several adversarial attacks have been presented to reveal the vulnerability of deep hashing models against adversarial examples. However existing attack methods suffer from degraded performance or inefficiency because they underutilize the semantic relations between original samples or spend a lot of time learning these relations with a deep neural network. In this paper we propose a novel Pharos-guided Attack dubbed PgA to evaluate the adversarial robustness of deep hashing networks reliably and efficiently. Specifically we design pharos code to represent the semantics of the benign image which preserves the similarity to semantically relevant samples and dissimilarity to irrelevant ones. It is proven that we can quickly calculate the pharos code via a simple math formula. Accordingly PgA can directly conduct a reliable and efficient attack on deep hashing-based retrieval by maximizing the similarity between the hash code of the adversarial example and the pharos code. Extensive experiments on the benchmark datasets verify that the proposed algorithm outperforms the prior state-of-the-arts in both attack strength and speed. |
2023 | Graph-collaborated Auto-encoder Hashing For Multi-view Binary Clustering | Wang Huibing, Yao Mingze, Jiang Guangqi, Mi Zetian, Fu Xianping | Arxiv | Unsupervised hashing methods have attracted widespread attention with the explosive growth of large-scale data which can greatly reduce storage and computation by learning compact binary codes. Existing unsupervised hashing methods attempt to exploit the valuable information from samples which fails to take the local geometric structure of unlabeled samples into consideration. Moreover hashing based on auto-encoders aims to minimize the reconstruction loss between the input data and binary codes which ignores the potential consistency and complementarity of multiple sources data. To address the above issues we propose a hashing algorithm based on auto-encoders for multi-view binary clustering which dynamically learns affinity graphs with low-rank constraints and adopts collaboratively learning between auto-encoders and affinity graphs to learn a unified binary code called Graph-Collaborated Auto-Encoder Hashing for Multi-view Binary Clustering (GCAE). Specifically we propose a multi-view affinity graphs learning model with low-rank constraint which can mine the underlying geometric information from multi-view data. Then we design an encoder-decoder paradigm to collaborate the multiple affinity graphs which can learn a unified binary code effectively. Notably we impose the decorrelation and code balance constraints on binary codes to reduce the quantization errors. Finally we utilize an alternating iterative optimization scheme to obtain the multi-view clustering results. Extensive experimental results on 5 public datasets are provided to reveal the effectiveness of the algorithm and its superior performance over other state-of-the-art alternatives. |
2023 | IDEA: An Invariant Perspective for Efficient Domain Adaptive Image Retrieval | Haixin Wang, Hao Wu, Jinan Sun, Shikun Zhang, Chong Chen, Xian-Sheng Hua, Xiao Luo | NeurIPS | In this paper, we investigate the problem of unsupervised domain adaptive hashing, which leverage knowledge from a label-rich source domain to expedite learning to hash on a label-scarce target domain. Although numerous existing approaches attempt to incorporate transfer learning techniques into deep hashing frameworks, they often neglect the essential invariance for adequate alignment between these two domains. Worse yet, these methods fail to distinguish between causal and non-causal effects embedded in images, rendering cross-domain retrieval ineffective. To address these challenges, we propose an Invariance-acquired Domain AdaptivE HAshing (IDEA) model. Our IDEA first decomposes each image into a causal feature representing label information, and a non-causal feature indicating domain information. Subsequently, we generate discriminative hash codes using causal features with consistency learning on both source and target domains. More importantly, we employ a generative model for synthetic samples to simulate the intervention of various non-causal effects, ultimately minimizing their impact on hash codes for domain invariance. Comprehensive experiments conducted on benchmark datasets validate the superior performance of our IDEA compared to a variety of competitive baselines. |
2023 | Deep Lifelong Cross-modal Hashing | Xu Liming, Li Hanqi, Zheng Bochuan, Li Weisheng, Lv Jiancheng | Arxiv | Hashing methods have made significant progress in cross-modal retrieval tasks with fast query speed and low storage cost. Among them deep learning-based hashing achieves better performance on large-scale data due to its excellent extraction and representation ability for nonlinear heterogeneous features. However there are still two main challenges in catastrophic forgetting when data with new categories arrive continuously and time-consuming for non-continuous hashing retrieval to retrain for updating. To this end we in this paper propose a novel deep lifelong cross-modal hashing to achieve lifelong hashing retrieval instead of re-training hash function repeatedly when new data arrive. Specifically we design lifelong learning strategy to update hash functions by directly training the incremental data instead of retraining new hash functions using all the accumulated data which significantly reduce training time. Then we propose lifelong hashing loss to enable original hash codes participate in lifelong learning but remain invariant and further preserve the similarity and dis-similarity among original and incremental hash codes to maintain performance. Additionally considering distribution heterogeneity when new data arriving continuously we introduce multi-label semantic similarity to supervise hash learning and it has been proven that the similarity improves performance with detailed analysis. Experimental results on benchmark datasets show that the proposed methods achieves comparative performance comparing with recent state-of-the-art cross-modal hashing methods and it yields substantial average increments over 2037; in retrieval accuracy and almost reduces over 8037; training time when new data arrives continuously. |
2023 | Fast Locality Sensitive Hashing With Theoretical Guarantee | Tan Zongyuan, Wang Hongya, Xu Bo, Luo Minjie, Du Ming | Arxiv | Locality-sensitive hashing (LSH) is an effective randomized technique widely used in many machine learning tasks. The cost of hashing is proportional to data dimensions and thus often the performance bottleneck when dimensionality is high and the number of hash functions involved is large. Surprisingly however little work has been done to improve the efficiency of LSH computation. In this paper we design a simple yet efficient LSH scheme named FastLSH under l2 norm. By combining random sampling and random projection FastLSH reduces the time complexity from O(n) to O(m) (m<n) where n is the data dimensionality and m is the number of sampled dimensions. Moreover FastLSH has provable LSH property which distinguishes it from the non-LSH fast sketches. We conduct comprehensive experiments over a collection of real and synthetic datasets for the nearest neighbor search task. Experimental results demonstrate that FastLSH is on par with the state-of-the-arts in terms of answer quality space occupation and query efficiency while enjoying up to 80x speedup in hash function evaluation. We believe that FastLSH is a promising alternative to the classic LSH scheme. |
2023 | Deep Hashing Via Householder Quantization | Schwengber Lucas R., Resende Lucas, Orenstein Paulo, Oliveira Roberto I. | Arxiv | Hashing is at the heart of large-scale image similarity search and recent methods have been substantially improved through deep learning techniques. Such algorithms typically learn continuous embeddings of the data. To avoid a subsequent costly binarization step a common solution is to employ loss functions that combine a similarity learning term (to ensure similar images are grouped to nearby embeddings) and a quantization penalty term (to ensure that the embedding entries are close to binarized entries e.g. -1 or 1). Still the interaction between these two terms can make learning harder and the embeddings worse. We propose an alternative quantization strategy that decomposes the learning problem in two stages first perform similarity learning over the embedding space with no quantization; second find an optimal orthogonal transformation of the embeddings so each coordinate of the embedding is close to its sign and then quantize the transformed embedding through the sign function. In the second step we parametrize orthogonal transformations using Householder matrices to efficiently leverage stochastic gradient descent. Since similarity measures are usually invariant under orthogonal transformations this quantization strategy comes at no cost in terms of performance. The resulting algorithm is unsupervised fast hyperparameter-free and can be run on top of any existing deep hashing or metric learning algorithm. We provide extensive experimental results showing that this approach leads to state-of-the-art performance on widely used image datasets and unlike other quantization strategies brings consistent improvements in performance to existing deep hashing algorithms. |
2023 | Unsupervised Hashing With Similarity Distribution Calibration | Ng Kam Woh, Zhu Xiatian, Hoe Jiun Tian, Chan Chee Seng, Zhang Tianyu, Song Yi-zhe, Xiang Tao | Arxiv | Unsupervised hashing methods typically aim to preserve the similarity between data points in a feature space by mapping them to binary hash codes. However these methods often overlook the fact that the similarity between data points in the continuous feature space may not be preserved in the discrete hash code space due to the limited similarity range of hash codes. The similarity range is bounded by the code length and can lead to a problem known as similarity collapse. That is the positive and negative pairs of data points become less distinguishable from each other in the hash space. To alleviate this problem in this paper a novel Similarity Distribution Calibration (SDC) method is introduced. SDC aligns the hash code similarity distribution towards a calibration distribution (e.g. beta distribution) with sufficient spread across the entire similarity range thus alleviating the similarity collapse problem. Extensive experiments show that our SDC outperforms significantly the state-of-the-art alternatives on coarse category-level and instance-level image retrieval. Code is available at https://github.com/kamwoh/sdc.” |
2023 | Deep Supervised Hashing For Fast Retrieval Of Radio Image Cubes | Ndung'u Steven, Grobler Trienko, Wijnholds Stefan J., Karastoyanova Dimka, Azzopardi George | Arxiv | The shear number of sources that will be detected by next-generation radio surveys will be astronomical which will result in serendipitous discoveries. Data-dependent deep hashing algorithms have been shown to be efficient at image retrieval tasks in the fields of computer vision and multimedia. However there are limited applications of these methodologies in the field of astronomy. In this work we utilize deep hashing to rapidly search for similar images in a large database. The experiment uses a balanced dataset of 2708 samples consisting of four classes Compact FRI FRII and Bent. The performance of the method was evaluated using the mean average precision (mAP) metric where a precision of 88.537; was achieved. The experimental results demonstrate the capability to search and retrieve similar radio images efficiently and at scale. The retrieval is based on the Hamming distance between the binary hash of the query image and those of the reference images in the database. |
2023 | Image Hash Minimization For Tamper Detection | Maity Subhajit, Karsh Ram Kumar | Tamper detection using image hash is a very common problem of modern days. Several research and advancements have already been done to address this problem. However most of the existing methods lack the accuracy of tamper detection when the tampered area is low as well as requiring long image hashes. In this paper we propose a novel method objectively to minimize the hash length while enhancing the performance at low tampered area. |
|
2023 | A Survey on Deep Hashing Methods | Xiao Luo, Haixin Wang, Daqing Wu, Chong Chen, Minghua Deng, Jianqiang Huang, Xian-Sheng Hua | ACM Transactions on Knowledge Discovery from Data | Nearest neighbor search aims at obtaining the samples in the database with the smallest distances from them to the queries, which is a basic task in a range of fields, including computer vision and data mining. Hashing is one of the most widely used methods for its computational and storage efficiency. With the development of deep learning, deep hashing methods show more advantages than traditional methods. In this survey, we detailedly investigate current deep hashing algorithms including deep supervised hashing and deep unsupervised hashing. Specifically, we categorize deep supervised hashing methods into pairwise methods, ranking-based methods, pointwise methods as well as quantization according to how measuring the similarities of the learned hash codes. Moreover, deep unsupervised hashing is categorized into similarity reconstruction-based methods, pseudo-label-based methods, and prediction-free self-supervised learning-based methods based on their semantic learning manners. We also introduce three related important topics including semi-supervised deep hashing, domain adaption deep hashing, and multi-modal deep hashing. Meanwhile, we present some commonly used public datasets and the scheme to measure the performance of deep hashing algorithms. Finally, we discuss some potential research directions in conclusion. |
2023 | Attributes Grouping And Mining Hashing For Fine-grained Image Retrieval | Lu Xin, Chen Shikun, Cao Yichao, Zhou Xin, Lu Xiaobo | Proceedings of the | In recent years hashing methods have been popular in the large-scale media search for low storage and strong representation capabilities. To describe objects with similar overall appearance but subtle differences more and more studies focus on hashing-based fine-grained image retrieval. Existing hashing networks usually generate both local and global features through attention guidance on the same deep activation tensor which limits the diversity of feature representations. To handle this limitation we substitute convolutional descriptors for attention-guided features and propose an Attributes Grouping and Mining Hashing (AGMH) which groups and embeds the category-specific visual attributes in multiple descriptors to generate a comprehensive feature representation for efficient fine-grained image retrieval. Specifically an Attention Dispersion Loss (ADL) is designed to force the descriptors to attend to various local regions and capture diverse subtle details. Moreover we propose a Stepwise Interactive External Attention (SIEA) to mine critical attributes in each descriptor and construct correlations between fine-grained attributes and objects. The attention mechanism is dedicated to learning discrete attributes which will not cost additional computations in hash codes generation. Finally the compact binary codes are learned by preserving pairwise similarities. Experimental results demonstrate that AGMH consistently yields the best performance against state-of-the-art methods on fine-grained benchmark datasets. |
2023 | Sparse-inductive Generative Adversarial Hashing For Nearest Neighbor Search | Liu Hong | Arxiv | Unsupervised hashing has received extensive research focus on the past decade which typically aims at preserving a predefined metric (i.e. Euclidean metric) in the Hamming space. To this end the encoding functions of the existing hashing are typically quasi-isometric which devote to reducing the quantization loss from the target metric space to the discrete Hamming space. However it is indeed problematic to directly minimize such error since such mentioned two metric spaces are heterogeneous and the quasi-isometric mapping is non-linear. The former leads to inconsistent feature distributions while the latter leads to problematic optimization issues. In this paper we propose a novel unsupervised hashing method termed Sparsity-Induced Generative Adversarial Hashing (SiGAH) to encode large-scale high-dimensional features into binary codes which well solves the two problems through a generative adversarial training framework. Instead of minimizing the quantization loss our key innovation lies in enforcing the learned Hamming space to have similar data distribution to the target metric space via a generative model. In particular we formulate a ReLU-based neural network as a generator to output binary codes and an MSE-loss based auto-encoder network as a discriminator upon which a generative adversarial learning is carried out to train hash functions. Furthermore to generate the synthetic features from the hash codes a compressed sensing procedure is introduced into the generative model which enforces the reconstruction boundary of binary codes to be consistent with that of original features. Finally such generative adversarial framework can be trained via the Adam optimizer. Experimental results on four benchmarks i.e. Tiny100K GIST1M Deep1M and MNIST have shown that the proposed SiGAH has superior performance over the state-of-the-art approaches. |
2023 | Can LSH (locality-sensitive Hashing) Be Replaced By Neural Network | Liu Renyang, Zhao Jun, Chu Xing, Liang Yu, Zhou Wei, He Jing | Arxiv | With the rapid development of GPU (Graphics Processing Unit) technologies and neural networks we can explore more appropriate data structures and algorithms. Recent progress shows that neural networks can partly replace traditional data structures. In this paper we proposed a novel DNN (Deep Neural Network)-based learned locality-sensitive hashing called LLSH to efficiently and flexibly map high-dimensional data to low-dimensional space. LLSH replaces the traditional LSH (Locality-sensitive Hashing) function families with parallel multi-layer neural networks which reduces the time and memory consumption and guarantees query accuracy simultaneously. The proposed LLSH demonstrate the feasibility of replacing the hash index with learning-based neural networks and open a new door for developers to design and configure data organization more accurately to improve information-searching performance. Extensive experiments on different types of datasets show the superiority of the proposed method in query accuracy time consumption and memory usage. |
2023 | HS-GCN Hamming Spatial Graph Convolutional Networks For Recommendation | Liu Han, Wei Yinwei, Yin Jianhua, Nie Liqiang | Arxiv | An efficient solution to the large-scale recommender system is to represent users and items as binary hash codes in the Hamming space. Towards this end existing methods tend to code users by modeling their Hamming similarities with the items they historically interact with which are termed as the first-order similarities in this work. Despite their efficiency these methods suffer from the suboptimal representative capacity since they forgo the correlation established by connecting multiple first-order similarities i.e. the relation among the indirect instances which could be defined as the high-order similarity. To tackle this drawback we propose to model both the first- and the high-order similarities in the Hamming space through the user-item bipartite graph. Therefore we develop a novel learning to hash framework namely Hamming Spatial Graph Convolutional Networks (HS-GCN) which explicitly models the Hamming similarity and embeds it into the codes of users and items. Extensive experiments on three public benchmark datasets demonstrate that our proposed model significantly outperforms several state-of-the-art hashing models and obtains performance comparable with the real-valued recommendation models. |
2023 | Dual-stream Knowledge-preserving Hashing For Unsupervised Video Retrieval | Li Pandeng, Xie Hongtao, Ge Jiannan, Zhang Lei, Min Shaobo, Zhang Yongdong | Arxiv | Unsupervised video hashing usually optimizes binary codes by learning to reconstruct input videos. Such reconstruction constraint spends much effort on frame-level temporal context changes without focusing on video-level global semantics that are more useful for retrieval. Hence we address this problem by decomposing video information into reconstruction-dependent and semantic-dependent information which disentangles the semantic extraction from reconstruction constraint. Specifically we first design a simple dual-stream structure including a temporal layer and a hash layer. Then with the help of semantic similarity knowledge obtained from self-supervision the hash layer learns to capture information for semantic retrieval while the temporal layer learns to capture the information for reconstruction. In this way the model naturally preserves the disentangled semantics into binary codes. Validated by comprehensive experiments our method consistently outperforms the state-of-the-arts on three video benchmarks. |
2023 | Locality Preserving Multiview Graph Hashing For Large Scale Remote Sensing Image Search | Li Wenyun, Zhong Guo, Lu Xingyu, Pun Chi-man | Arxiv | Hashing is very popular for remote sensing image search. This article proposes a multiview hashing with learnable parameters to retrieve the queried images for a large-scale remote sensing dataset. Existing methods always neglect that real-world remote sensing data lies on a low-dimensional manifold embedded in high-dimensional ambient space. Unlike previous methods this article proposes to learn the consensus compact codes in a view-specific low-dimensional subspace. Furthermore we have added a hyperparameter learnable module to avoid complex parameter tuning. In order to prove the effectiveness of our method we carried out experiments on three widely used remote sensing data sets and compared them with seven state-of-the-art methods. Extensive experiments show that the proposed method can achieve competitive results compared to the other method. |
2023 | Shockhash Near Optimal-space Minimal Perfect Hashing Beyond Brute-force | Lehmann Hans-peter, Sanders Peter, Walzer Stefan | Arxiv | A minimal perfect hash function (MPHF) maps a set S of n keys to the first n integers without collisions. There is a lower bound of nlog(e)=1.44n bits needed to represent an MPHF. This can be reached by a brute-force algorithm that tries e^n hash function seeds in expectation and stores the first seed leading to an MPHF. The most space-efficient previous algorithms for constructing MPHFs all use such a brute-force approach as a basic building block. In this paper we introduce ShockHash - Small heavily overloaded cuckoo hash tables for minimal perfect hashing. ShockHash uses two hash functions h_0 and h_1 hoping for the existence of a function f S-0 1 such that x - h_f(x)(x) is an MPHF on S. It then uses a 1-bit retrieval data structure to store f using n + o(n) bits. In graph terminology ShockHash generates n-edge random graphs until stumbling on a pseudoforest - where each component contains as many edges as nodes. Using cuckoo hashing ShockHash then derives an MPHF from the pseudoforest in linear time. We show that ShockHash needs to try only about (e/2)^n=1.359^n seeds in expectation. This reduces the space for storing the seed by roughly n bits (maintaining the asymptotically optimal space consumption) and speeds up construction by almost a factor of 2^n compared to brute-force. Bipartite ShockHash reduces the expected construction time again to 1.166^n by maintaining a pool of candidate hash functions and checking all possible pairs. ShockHash as a building block within the RecSplit framework can be constructed up to 3 orders of magnitude faster than competing approaches. It can build an MPHF for 10 million keys with 1.489 bits per key in about half an hour. When instead using ShockHash after an efficient k-perfect hash function it achieves space usage similar to the best competitors while being significantly faster to construct and query. |
2023 | Sliding Block Hashing (slick) -- Basic Algorithmic Ideas | Lehmann Hans-peter, Sanders Peter, Walzer Stefan | Arxiv | We present (bf) Sliding Blo(bf) ck Hashing (Slick) a simple hash table data structure that combines high performance with very good space efficiency. This preliminary report outlines avenues for analysis and implementation that we intend to pursue. |
2023 | Optimal-hash Exact String Matching Algorithms | Lecroq Thierry | Arxiv | String matching is the problem of finding all the occurrences of a pattern in a text. We propose improved versions of the fast family of string matching algorithms based on hashing q-grams. The improvement consists of considering minimal values q such that each q-grams of the pattern has a unique hash value. The new algorithms are fastest than algorithm of the HASH family for short patterns on large size alphabets. |
2023 | Elastichash Semantic Image Similarity Search By Deep Hashing With Elasticsearch | Korfhage Nikolaus, Mühling Markus, Freisleben Bernd | The | We present ElasticHash a novel approach for high-quality efficient and large-scale semantic image similarity search. It is based on a deep hashing model to learn hash codes for fine-grained image similarity search in natural images and a two-stage method for efficiently searching binary hash codes using Elasticsearch (ES). In the first stage a coarse search based on short hash codes is performed using multi-index hashing and ES terms lookup of neighboring hash codes. In the second stage the list of results is re-ranked by computing the Hamming distance on long hash codes. We evaluate the retrieval performance of (textitElasticHash) for more than 120000 query images on about 6.9 million database images of the OpenImages data set. The results show that our approach achieves high-quality retrieval results and low search latencies. |
2023 | Fast Consistent Hashing In Constant Time | Leu Eric | Arxiv | Consistent hashing is a technique that can minimize key remapping when the number of hash buckets changes. The paper proposes a fast consistent hash algorithm (called power consistent hash) that has O(1) expected time for key lookup independent of the number of buckets. Hash values are computed in real time. No search data structure is constructed to store bucket ranges or key mappings. The algorithm has a lightweight design using O(1) space with superior scalability. In particular it uses two auxiliary hash functions to achieve distribution uniformity and O(1) expected time for key lookup. Furthermore it performs consistent hashing such that only a minimal number of keys are remapped when the number of buckets changes. Consistent hashing has a wide range of use cases including load balancing distributed caching and distributed key-value stores. The proposed algorithm is faster than well-known consistent hash algorithms with O((log) n) lookup time. |
2023 | Fast Online Hashing with Multi-Label Projection | Wenzhe Jia, Yuan Cao, Junwei Liu, Jie Gui | AAAI | Hashing has been widely researched to solve the large-scale approximate nearest neighbor search problem owing to its time and storage superiority. In recent years, a number of online hashing methods have emerged, which can update the hash functions to adapt to the new stream data and realize dynamic retrieval. However, existing online hashing methods are required to update the whole database with the latest hash functions when a query arrives, which leads to low retrieval efficiency with the continuous increase of the stream data. On the other hand, these methods ignore the supervision relationship among the examples, especially in the multi-label case. In this paper, we propose a novel Fast Online Hashing (FOH) method which only updates the binary codes of a small part of the database. To be specific, we first build a query pool in which the nearest neighbors of each central point are recorded. When a new query arrives, only the binary codes of the corresponding potential neighbors are updated. In addition, we create a similarity matrix which takes the multi-label supervision information into account and bring in the multi-label projection loss to further preserve the similarity among the multi-label data. The experimental results on two common benchmarks show that the proposed FOH can achieve dramatic superiority on query time up to 6.28 seconds less than state-of-the-art baselines with competitive retrieval accuracy. |
2023 | Deep Metric Multi-view Hashing For Multimedia Retrieval | Zhu Jian, Huang Zhangmin, Ruan Xiaohu, Cui Yu, Cheng Yongli, Zeng Lingfang | Arxiv | Learning the hash representation of multi-view heterogeneous data is an important task in multimedia retrieval. However existing methods fail to effectively fuse the multi-view features and utilize the metric information provided by the dissimilar samples leading to limited retrieval precision. Current methods utilize weighted sum or concatenation to fuse the multi-view features. We argue that these fusion methods cannot capture the interaction among different views. Furthermore these methods ignored the information provided by the dissimilar samples. We propose a novel deep metric multi-view hashing (DMMVH) method to address the mentioned problems. Extensive empirical evidence is presented to show that gate-based fusion is better than typical methods. We introduce deep metric learning to the multi-view hashing problems which can utilize metric information of dissimilar samples. On the MIR-Flickr25K MS COCO and NUS-WIDE our method outperforms the current state-of-the-art methods by a large margin (up to 15.28 mean Average Precision (mAP) improvement). |
2023 | A Sparse Johnson-lindenstrauss Transform Using Fast Hashing | Houen Jakob Bæk Tejs, Thorup Mikkel | Arxiv | The (emph)Sparse Johnson-Lindenstrauss Transform of Kane and Nelson (SODA 2012) provides a linear dimensionality-reducing map A (in) (mathbbR)^m (times) u in (ell)_2 that preserves distances up to distortion of 1 + (varepsilon) with probability 1 - (delta) where m = O((varepsilon)^-2 (log) 1/(delta)) and each column of A has O((varepsilon) m) non-zero entries. The previous analyses of the Sparse Johnson-Lindenstrauss Transform all assumed access to a (Omega)((log) 1/(delta))-wise independent hash function. The main contribution of this paper is a more general analysis of the Sparse Johnson-Lindenstrauss Transform with less assumptions on the hash function. We also show that the (emph)Mixed Tabulation hash function of Dahlgaard Knudsen Rotenberg and Thorup (FOCS 2015) satisfies the conditions of our analysis thus giving us the first analysis of a Sparse Johnson-Lindenstrauss Transform that works with a practical hash function. |
2023 | Identifying Reducible K-tuples Of Vectors With Subspace-proximity Sensitive Hashing/filtering | Holden Gabriella, Shiu Daniel, Strutt Lauren | Arxiv | We introduce and analyse a family of hash and predicate functions that are more likely to produce collisions for small reducible configurations of vectors. These may offer practical improvements to lattice sieving for short vectors. In particular in one asymptotic regime the family exhibits significantly different convergent behaviour than existing hash functions and predicates. |
2023 | CLIP Multi-modal Hashing A New Baseline CLIPMH | Zhu Jian, Sheng Mingkai, Ke Mingda, Huang Zhangmin, Chang Jingfei | Arxiv | The multi-modal hashing method is widely used in multimedia retrieval. It can fuse multi-source data to generate binary hash code. However the current multi-modal methods have the problem of low retrieval accuracy. The reason is that the individual backbone networks have limited feature expression capabilities and are not jointly pre-trained on large-scale unsupervised multi-modal data. To solve this problem we propose a new baseline CLIP Multi-modal Hashing (CLIPMH) method. It uses CLIP model to extract text and image features and then fuse to generate hash code. CLIP improves the expressiveness of each modal feature. In this way it can greatly improve the retrieval performance of multi-modal hashing methods. In comparison to state-of-the-art unsupervised and supervised multi-modal hashing methods experiments reveal that the proposed CLIPMH can significantly enhance performance (Maximum increase of 8.3837;). CLIP also has great advantages over the text and visual backbone networks commonly used before. |
2023 | Geometric Covering Using Random Fields | Goncalves Felipe, Keren Daniel, Shahar Amit, Yehuda Gal | Arxiv | A set of vectors S (subseteq) (mathbbR)^d is (k_1(varepsilon))-clusterable if there are k_1 balls of radius (varepsilon) that cover S. A set of vectors S (subseteq) (mathbbR)^d is (k_2(delta))-far from being clusterable if there are at least k_2 vectors in S with all pairwise distances at least (delta). We propose a probabilistic algorithm to distinguish between these two cases. Our algorithm reaches a decision by only looking at the extreme values of a scalar valued hash function defined by a random field on S; hence it is especially suitable in distributed and online settings. An important feature of our method is that the algorithm is oblivious to the number of vectors in the online setting for example the algorithm stores only a constant number of scalars which is independent of the stream length. We introduce random field hash functions which are a key ingredient in our paradigm. Random field hash functions generalize locality-sensitive hashing (LSH). In addition to the LSH requirement that nearby vectors are hashed to similar values our hash function also guarantees that the hash values are (nearly) independent random variables for distant vectors. We formulate necessary conditions for the kernels which define the random fields applied to our problem as well as a measure of kernel optimality for which we provide a bound. Then we propose a method to construct kernels which approximate the optimal one. |
2023 | Towards Efficient Deep Hashing Retrieval Condensing Your Data Via Feature-embedding Matching | Feng Tao, Zhang Jie, Wang Peizheng, Wang Zhijie | Arxiv | The expenses involved in training state-of-the-art deep hashing retrieval models have witnessed an increase due to the adoption of more sophisticated models and large-scale datasets. Dataset Distillation (DD) or Dataset Condensation(DC) focuses on generating smaller synthetic dataset that retains the original information. Nevertheless existing DD methods face challenges in maintaining a trade-off between accuracy and efficiency. And the state-of-the-art dataset distillation methods can not expand to all deep hashing retrieval methods. In this paper we propose an efficient condensation framework that addresses these limitations by matching the feature-embedding between synthetic set and real set. Furthermore we enhance the diversity of features by incorporating the strategies of early-stage augmented models and multi-formation. Extensive experiments provide compelling evidence of the remarkable superiority of our approach both in terms of performance and efficiency compared to state-of-the-art baseline methods. |
2023 | Bounds For C-ideal Hashing | Frei Fabian, Wehner David | Arxiv | In this paper we analyze hashing from a worst-case perspective. To this end we study a new property of hash families that is strongly related to d-perfect hashing namely c-ideality. On the one hand this notion generalizes the definition of perfect hashing which has been studied extensively; on the other hand it provides a direct link to the notion of c-approximativity. We focus on the usually neglected case where the average load (alpha) is at least 1 and prove upper and lower parametrized bounds on the minimal size of c-ideal hash families. As an aside we show how c-ideality helps to analyze the advice complexity of hashing. The concept of advice introduced a decade ago lets us measure the information content of an online problem. We prove hashings advice complexity to be linear in the hash table size. |
2023 | Central Similarity Multi-view Hashing For Multimedia Retrieval | Zhu Jian, Cheng Wen, Cui Yu, Tang Chang, Dai Yuyang, Li Yong, Zeng Lingfang | Arxiv | Hash representation learning of multi-view heterogeneous data is the key to improving the accuracy of multimedia retrieval. However existing methods utilize local similarity and fall short of deeply fusing the multi-view features resulting in poor retrieval accuracy. Current methods only use local similarity to train their model. These methods ignore global similarity. Furthermore most recent works fuse the multi-view features via a weighted sum or concatenation. We contend that these fusion methods are insufficient for capturing the interaction between various views. We present a novel Central Similarity Multi-View Hashing (CSMVH) method to address the mentioned problems. Central similarity learning is used for solving the local similarity problem which can utilize the global similarity between the hash center and samples. We present copious empirical data demonstrating the superiority of gate-based fusion over conventional approaches. On the MS COCO and NUS-WIDE the proposed CSMVH performs better than the state-of-the-art methods by a large margin (up to 11.4137; mean Average Precision (mAP) improvement). |
2023 | Supervised Auto-encoding Twin-bottleneck Hashing | Chen Yuan, Marchand-maillet Stéphane | Arxiv | Deep hashing has shown to be a complexity-efficient solution for the Approximate Nearest Neighbor search problem in high dimensional space. Many methods usually build the loss function from pairwise or triplet data points to capture the local similarity structure. Other existing methods construct the similarity graph and consider all points simultaneously. Auto-encoding Twin-bottleneck Hashing is one such method that dynamically builds the graph. Specifically each input data is encoded into a binary code and a continuous variable or the so-called twin bottlenecks. The similarity graph is then computed from these binary codes which get updated consistently during the training. In this work we generalize the original model into a supervised deep hashing network by incorporating the label information. In addition we examine the differences of codes structure between these two networks and consider the class imbalance problem especially in multi-labeled datasets. Experiments on three datasets yield statistically significant improvement against the original model. Results are also comparable and competitive to other supervised methods. |
2023 | Bipartite Graph Convolutional Hashing For Effective And Efficient Top-n Search In Hamming Space | Chen Yankai, Fang Yixiang, Zhang Yifei, King Irwin | Arxiv | Searching on bipartite graphs is basal and versatile to many real-world Web applications e.g. online recommendation database retrieval and query-document searching. Given a query node the conventional approaches rely on the similarity matching with the vectorized node embeddings in the continuous Euclidean space. To efficiently manage intensive similarity computation developing hashing techniques for graph structured data has recently become an emerging research direction. Despite the retrieval efficiency in Hamming space prior work is however confronted with catastrophic performance decay. In this work we investigate the problem of hashing with Graph Convolutional Network on bipartite graphs for effective Top-N search. We propose an end-to-end Bipartite Graph Convolutional Hashing approach namely BGCH which consists of three novel and effective modules (1) adaptive graph convolutional hashing (2) latent feature dispersion and (3) Fourier serialized gradient estimation. Specifically the former two modules achieve the substantial retention of the structural information against the inevitable information loss in hash encoding; the last module develops Fourier Series decomposition to the hashing function in the frequency domain mainly for more accurate gradient estimation. The extensive experiments on six real-world datasets not only show the performance superiority over the competing hashing-based counterparts but also demonstrate the effectiveness of all proposed model components contained therein. |
2023 | Weighted Minwise Hashing Beats Linear Sketching For Inner Product Estimation | Bessa Aline, Daliri Majid, Freire Juliana, Musco Cameron, Musco Christopher, Santos Aécio, Zhang Haoxiang | In Proceedings of the ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems | We present a new approach for computing compact sketches that can be used to approximate the inner product between pairs of high-dimensional vectors. Based on the Weighted MinHash algorithm our approach admits strong accuracy guarantees that improve on the guarantees of popular linear sketching approaches for inner product estimation such as CountSketch and Johnson-Lindenstrauss projection. Specifically while our method admits guarantees that exactly match linear sketching for dense vectors it yields significantly lower error for sparse vectors with limited overlap between non-zero entries. Such vectors arise in many applications involving sparse data. They are also important in increasingly popular dataset search applications where inner product sketches are used to estimate data covariance conditional means and other quantities involving columns in unjoined tables. We complement our theoretical results by showing that our approach empirically outperforms existing linear sketches and unweighted hashing-based sketches for sparse vectors. |
2023 | Locally Uniform Hashing | Bercea Ioana O., Beretta Lorenzo, Klausen Jonas, Houen Jakob Bæk Tejs, Thorup Mikkel | Arxiv | Hashing is a common technique used in data processing with a strong impact on the time and resources spent on computation. Hashing also affects the applicability of theoretical results that often assume access to (unrealistic) uniform/fully-random hash functions. In this paper we are concerned with designing hash functions that are practical and come with strong theoretical guarantees on their performance. To this end we present tornado tabulation hashing which is simple fast and exhibits a certain full local randomness property that provably makes diverse algorithms perform almost as if (abstract) fully-random hashing was used. For example this includes classic linear probing the widely used HyperLogLog algorithm of Flajolet Fusy Gandouet Meunier AOFA 97 for counting distinct elements and the one-permutation hashing of Li Owen and Zhang NIPS 12 for large-scale machine learning. We also provide a very efficient solution for the classical problem of obtaining fully-random hashing on a fixed (but unknown to the hash function) set of n keys using O(n) space. As a consequence we get more efficient implementations of the splitting trick of Dietzfelbinger and Rink ICALP09 and the succinct space uniform hashing of Pagh and Pagh SICOMP08. Tornado tabulation hashing is based on a simple method to systematically break dependencies in tabulation-based hashing techniques. |
2023 | Dedrift Robust Similarity Search Under Content Drift | Baranchuk Dmitry, Douze Matthijs, Upadhyay Yash, Yalniz I. Zeki | Arxiv | The statistical distribution of content uploaded and searched on media sharing sites changes over time due to seasonal sociological and technical factors. We investigate the impact of this content drift for large-scale similarity search tools based on nearest neighbor search in embedding space. Unless a costly index reconstruction is performed frequently content drift degrades the search accuracy and efficiency. The degradation is especially severe since in general both the query and database distributions change. We introduce and analyze real-world image and video datasets for which temporal information is available over a long time period. Based on the learnings we devise DeDrift a method that updates embedding quantizers to continuously adapt large-scale indexing structures on-the-fly. DeDrift almost eliminates the accuracy degradation due to the query and database content drift while being up to 100x faster than a full index reconstruction. |
2022 | Deep Unsupervised Contrastive Hashing For Large-scale Cross-modal Text-image Retrieval In Remote Sensing | Mikriukov Georgii, Ravanbakhsh Mahdyar, Demir Begüm | Arxiv | Due to the availability of large-scale multi-modal data (e.g. satellite images acquired by different sensors text sentences etc) archives the development of cross-modal retrieval systems that can search and retrieve semantically relevant data across different modalities based on a query in any modality has attracted great attention in RS. In this paper we focus our attention on cross-modal text-image retrieval where queries from one modality (e.g. text) can be matched to archive entries from another (e.g. image). Most of the existing cross-modal text-image retrieval systems require a high number of labeled training samples and also do not allow fast and memory-efficient retrieval due to their intrinsic characteristics. These issues limit the applicability of the existing cross-modal retrieval systems for large-scale applications in RS. To address this problem in this paper we introduce a novel deep unsupervised cross-modal contrastive hashing (DUCH) method for RS text-image retrieval. The proposed DUCH is made up of two main modules 1) feature extraction module (which extracts deep representations of the text-image modalities); and 2) hashing module (which learns to generate cross-modal binary hash codes from the extracted representations). Within the hashing module we introduce a novel multi-objective loss function including i) contrastive objectives that enable similarity preservation in both intra- and inter-modal similarities; ii) an adversarial objective that is enforced across two modalities for cross-modal representation consistency; iii) binarization objectives for generating representative hash codes. Experimental results show that the proposed DUCH outperforms state-of-the-art unsupervised cross-modal hashing methods on two multi-modal (image and text) benchmark archives in RS. Our code is publicly available at https://git.tu-berlin.de/rsim/duch.” |
2022 | Hamming Distributions Of Popular Perceptual Hashing Techniques | Mckeown Sean, Buchanan William J | DFRWS | Content-based file matching has been widely deployed for decades largely for the detection of sources of copyright infringement extremist materials and abusive sexual media. Perceptual hashes such as Microsofts PhotoDNA are one automated mechanism for facilitating detection allowing for machines to approximately match visual features of an image or video in a robust manner. However there does not appear to be much public evaluation of such approaches particularly when it comes to how effective they are against content-preserving modifications to media files. In this paper we present a million-image scale evaluation of several perceptual hashing archetypes for popular algorithms (including Facebooks PDQ Apples Neuralhash and the popular pHash library) against seven image variants. The focal point is the distribution of Hamming distance scores between both unrelated images and image variants to better understand the problems faced by each approach. |
2022 | Unsupervised Contrastive Hashing For Cross-modal Retrieval In Remote Sensing | Mikriukov Georgii, Ravanbakhsh Mahdyar, Demir Begüm | Arxiv | The development of cross-modal retrieval systems that can search and retrieve semantically relevant data across different modalities based on a query in any modality has attracted great attention in remote sensing (RS). In this paper we focus our attention on cross-modal text-image retrieval where queries from one modality (e.g. text) can be matched to archive entries from another (e.g. image). Most of the existing cross-modal text-image retrieval systems in RS require a high number of labeled training samples and also do not allow fast and memory-efficient retrieval. These issues limit the applicability of the existing cross-modal retrieval systems for large-scale applications in RS. To address this problem in this paper we introduce a novel unsupervised cross-modal contrastive hashing (DUCH) method for text-image retrieval in RS. To this end the proposed DUCH is made up of two main modules 1) feature extraction module which extracts deep representations of two modalities; 2) hashing module that learns to generate cross-modal binary hash codes from the extracted representations. We introduce a novel multi-objective loss function including i) contrastive objectives that enable similarity preservation in intra- and inter-modal similarities; ii) an adversarial objective that is enforced across two modalities for cross-modal representation consistency; and iii) binarization objectives for generating hash codes. Experimental results show that the proposed DUCH outperforms state-of-the-art methods. Our code is publicly available at https://git.tu-berlin.de/rsim/duch.” |
2022 | Asymmetric Transfer Hashing With Adaptive Bipartite Graph Learning | Lu Jianglin, Zhou Jie, Chen Yudong, Pedrycz Witold, Hung Kwok-wai | Arxiv | Thanks to the efficient retrieval speed and low storage consumption learning to hash has been widely used in visual retrieval tasks. However existing hashing methods assume that the query and retrieval samples lie in homogeneous feature space within the same domain. As a result they cannot be directly applied to heterogeneous cross-domain retrieval. In this paper we propose a Generalized Image Transfer Retrieval (GITR) problem which encounters two crucial bottlenecks 1) the query and retrieval samples may come from different domains leading to an inevitable domain distribution gap; 2) the features of the two domains may be heterogeneous or misaligned bringing up an additional feature gap. To address the GITR problem we propose an Asymmetric Transfer Hashing (ATH) framework with its unsupervised/semi-supervised/supervised realizations. Specifically ATH characterizes the domain distribution gap by the discrepancy between two asymmetric hash functions and minimizes the feature gap with the help of a novel adaptive bipartite graph constructed on cross-domain data. By jointly optimizing asymmetric hash functions and the bipartite graph not only can knowledge transfer be achieved but information loss caused by feature alignment can also be avoided. Meanwhile to alleviate negative transfer the intrinsic geometrical structure of single-domain data is preserved by involving a domain affinity graph. Extensive experiments on both single-domain and cross-domain benchmarks under different GITR subtasks indicate the superiority of our ATH method in comparison with the state-of-the-art hashing methods. |
2022 | Adaptive Asymmetric Label-guided Hashing For Multimedia Search | Long Yitian | Arxiv | With the rapid growth of multimodal media data on the Web in recent years hash learning methods as a way to achieve efficient and flexible cross-modal retrieval of massive multimedia data have received a lot of attention from the current Web resource retrieval research community. Existing supervised hashing methods simply transform label information into pairwise similarity information to guide hash learning leading to a potential risk of semantic error in the face of multi-label data. In addition most existing hash optimization methods solve NP-hard optimization problems by employing approximate approximation strategies based on relaxation strategies leading to a large quantization error. In order to address above obstacles we present a simple yet efficient Adaptive Asymmetric Label-guided Hashing named A2LH for Multimedia Search. Specifically A2LH is a two-step hashing method. In the first step we design an association representation model between the different modality representations and semantic label representation separately and use the semantic label representation as an intermediate bridge to solve the semantic gap existing between different modalities. In addition we present an efficient discrete optimization algorithm for solving the quantization error problem caused by relaxation-based optimization algorithms. In the second step we leverage the generated hash codes to learn the hash mapping functions. The experimental results show that our proposed method achieves optimal performance on all compared baseline methods. |
2022 | Prototype-based Layered Federated Cross-modal Hashing | Liu Jiale, Zhan Yu-wei, Luo Xin, Chen Zhen-duo, Wang Yongxin, Xu Xin-shun | Arxiv | Recently deep cross-modal hashing has gained increasing attention. However in many practical cases data are distributed and cannot be collected due to privacy concerns which greatly reduces the cross-modal hashing performance on each client. And due to the problems of statistical heterogeneity model heterogeneity and forcing each client to accept the same parameters applying federated learning to cross-modal hash learning becomes very tricky. In this paper we propose a novel method called prototype-based layered federated cross-modal hashing. Specifically the prototype is introduced to learn the similarity between instances and classes on server reducing the impact of statistical heterogeneity (non-IID) on different clients. And we monitor the distance between local and global prototypes to further improve the performance. To realize personalized federated learning a hypernetwork is deployed on server to dynamically update different layers weights of local model. Experimental results on benchmark datasets show that our method outperforms state-of-the-art methods. |
2022 | Deep Unsupervised Hashing With Latent Semantic Components | Lin Qinghong, Chen Xiaojun, Zhang Qin, Cai Shaotian, Zhao Wenzhe, Wang Hongfa | Arxiv | Deep unsupervised hashing has been appreciated in the regime of image retrieval. However most prior arts failed to detect the semantic components and their relationships behind the images which makes them lack discriminative power. To make up the defect we propose a novel Deep Semantic Components Hashing (DSCH) which involves a common sense that an image normally contains a bunch of semantic components with homology and co-occurrence relationships. Based on this prior DSCH regards the semantic components as latent variables under the Expectation-Maximization framework and designs a two-step iterative algorithm with the objective of maximum likelihood of training data. Firstly DSCH constructs a semantic component structure by uncovering the fine-grained semantics components of images with a Gaussian Mixture Modal~(GMM) where an image is represented as a mixture of multiple components and the semantics co-occurrence are exploited. Besides coarse-grained semantics components are discovered by considering the homology relationships between fine-grained components and the hierarchy organization is then constructed. Secondly DSCH makes the images close to their semantic component centers at both fine-grained and coarse-grained levels and also makes the images share similar semantic components close to each other. Extensive experiments on three benchmark datasets demonstrate that the proposed hierarchical semantic components indeed facilitate the hashing model to achieve superior performance. |
2022 | Asymmetric Scalable Cross-modal Hashing | Li Wenyun, Pun Chi-man | Arxiv | Cross-modal hashing is a successful method to solve large-scale multimedia retrieval issue. A lot of matrix factorization-based hashing methods are proposed. However the existing methods still struggle with a few problems such as how to generate the binary codes efficiently rather than directly relax them to continuity. In addition most of the existing methods choose to use an n(times) n similarity matrix for optimization which makes the memory and computation unaffordable. In this paper we propose a novel Asymmetric Scalable Cross-Modal Hashing (ASCMH) to address these issues. It firstly introduces a collective matrix factorization to learn a common latent space from the kernelized features of different modalities and then transforms the similarity matrix optimization to a distance-distance difference problem minimization with the help of semantic labels and common latent space. Hence the computational complexity of the n(times) n asymmetric optimization is relieved. In the generation of hash codes we also employ an orthogonal constraint of label information which is indispensable for search accuracy. So the redundancy of computation can be much reduced. For efficient optimization and scalable to large-scale datasets we adopt the two-step approach rather than optimizing simultaneously. Extensive experiments on three benchmark datasets Wiki MIRFlickr-25K and NUS-WIDE demonstrate that our ASCMH outperforms the state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency. |
2022 | Adaptive Structural Similarity Preserving For Unsupervised Cross Modal Hashing | Li Liang, Zheng Baihua, Sun Weiwei | Arxiv | Cross-modal hashing is an important approach for multimodal data management and application. Existing unsupervised cross-modal hashing algorithms mainly rely on data features in pre-trained models to mine their similarity relationships. However their optimization objectives are based on the static metric between the original uni-modal features without further exploring data correlations during the training. In addition most of them mainly focus on association mining and alignment among pairwise instances in continuous space but ignore the latent structural correlations contained in the semantic hashing space. In this paper we propose an unsupervised hash learning framework namely Adaptive Structural Similarity Preservation Hashing (ASSPH) to solve the above problems. Firstly we propose an adaptive learning scheme with limited data and training batches to enrich semantic correlations of unlabeled instances during the training process and meanwhile to ensure a smooth convergence of the training process. Secondly we present an asymmetric structural semantic representation learning scheme. We introduce structural semantic metrics based on graph adjacency relations during the semantic reconstruction and correlation mining stage and meanwhile align the structure semantics in the hash space with an asymmetric binary optimization process. Finally we conduct extensive experiments to validate the enhancements of our work in comparison with existing works. |
2022 | Pachash Packed And Compressed Hash Tables | Kurpicz Florian, Lehmann Hans-peter, Sanders Peter | Arxiv | We introduce PaCHash a hash table that stores its objects contiguously in an array without intervening space even if the objects have variable size. In particular each object can be compressed using standard compression techniques. A small search data structure allows locating the objects in constant expected time. PaCHash is most naturally described as a static external hash table where it needs a constant number of bits of internal memory per block of external memory. Here in some sense PaCHash beats a lower bound on the space consumption of k-perfect hashing. An implementation for fast SSDs needs about 5 bits of internal memory per block of external memory requires only one disk access (of variable length) per search operation and has small internal search overhead compared to the disk access cost. Our experiments show that it has lower space consumption than all previous approaches even when considering objects of identical size. |
2022 | Fast Online Hashing With Multi-label Projection | Jia Wenzhe, Cao Yuan, Liu Junwei, Gui Jie | Arxiv | Hashing has been widely researched to solve the large-scale approximate nearest neighbor search problem owing to its time and storage superiority. In recent years a number of online hashing methods have emerged which can update the hash functions to adapt to the new stream data and realize dynamic retrieval. However existing online hashing methods are required to update the whole database with the latest hash functions when a query arrives which leads to low retrieval efficiency with the continuous increase of the stream data. On the other hand these methods ignore the supervision relationship among the examples especially in the multi-label case. In this paper we propose a novel Fast Online Hashing (FOH) method which only updates the binary codes of a small part of the database. To be specific we first build a query pool in which the nearest neighbors of each central point are recorded. When a new query arrives only the binary codes of the corresponding potential neighbors are updated. In addition we create a similarity matrix which takes the multi-label supervision information into account and bring in the multi-label projection loss to further preserve the similarity among the multi-label data. The experimental results on two common benchmarks show that the proposed FOH can achieve dramatic superiority on query time up to 6.28 seconds less than state-of-the-art baselines with competitive retrieval accuracy. |
2022 | A Lower Bound Of Hash Codes Performance | Xiaosu Zhu, Jingkuan Song, Yu Lei, Lianli Gao, Hengtao Shen | Neural Information Processing Systems | As a crucial approach for compact representation learning hashing has achieved great success in effectiveness and efficiency. Numerous heuristic Hamming space metric learning objectives are designed to obtain high-quality hash codes. Nevertheless a theoretical analysis of criteria for learning good hash codes remains largely unexploited. In this paper we prove that inter-class distinctiveness and intra-class compactness among hash codes determine the lower bound of hash codes performance. Promoting these two characteristics could lift the bound and improve hash learning. We then propose a surrogate model to fully exploit the above objective by estimating the posterior of hash codes and controlling it which results in a low-bias optimization. Extensive experiments reveal the effectiveness of the proposed method. By testing on a series of hash-models we obtain performance improvements among all of them with an up to 26.537; increase in mean Average Precision and an up to 20.537; increase in accuracy. Our code is publicly available at https://github.com/VL-Group/LBHash.” |
2022 | A non-alternating graph hashing algorithm for large scale image search | Sobhan Hemati, Mohammad Hadi Mehdizavareh, Shojaeddin Chenouri, Hamid R Tizhoosh | CVIU | In the era of big data, methods for improving memory and computational efficiency have become crucial for successful deployment of technologies. Hashing is one of the most effective approaches to deal with computational limitations that come with big data. One natural way for formulating this problem is spectral hashing that directly incorporates affinity to learn binary codes. However, due to binary constraints, the optimization becomes intractable. To mitigate this challenge, different relaxation approaches have been proposed to reduce the computational load of obtaining binary codes and still attain a good solution. The problem with all existing relaxation methods is resorting to one or more additional auxiliary variables to attain high quality binary codes while relaxing the problem. The existence of auxiliary variables leads to coordinate descent approach which increases the computational complexity. We argue that introducing these variables is unnecessary. To this end, we propose a novel relaxed formulation for spectral hashing that adds no additional variables to the problem. Furthermore, instead of solving the problem in original space where number of variables is equal to the data points, we solve the problem in a much smaller space and retrieve the binary codes from this solution. This trick reduces both the memory and computational complexity at the same time. We apply two optimization techniques, namely projected gradient and optimization on manifold, to obtain the solution. Using comprehensive experiments on four public datasets, we show that the proposed efficient spectral hashing (ESH) algorithm achieves highly competitive retrieval performance compared with state of the art at low complexity. |
2022 | Hyperdimensional Hashing A Robust And Efficient Dynamic Hash Table | Heddes Mike, Nunes Igor, Givargis Tony, Nicolau Alexandru, Veidenbaum Alex | Arxiv | Most cloud services and distributed applications rely on hashing algorithms that allow dynamic scaling of a robust and efficient hash table. Examples include AWS Google Cloud and BitTorrent. Consistent and rendezvous hashing are algorithms that minimize key remapping as the hash table resizes. While memory errors in large-scale cloud deployments are common neither algorithm offers both efficiency and robustness. Hyperdimensional Computing is an emerging computational model that has inherent efficiency robustness and is well suited for vector or hardware acceleration. We propose Hyperdimensional (HD) hashing and show that it has the efficiency to be deployed in large systems. Moreover a realistic level of memory errors causes more than 2037; mismatches for consistent hashing while HD hashing remains unaffected. |
2022 | Accelerating Code Search With Deep Hashing And Code Classification | Gu Wenchao, Wang Yanlin, Du Lun, Zhang Hongyu, Han Shi, Zhang Dongmei, Lyu Michael R. | Arxiv | Code search is to search reusable code snippets from source code corpus based on natural languages queries. Deep learning-based methods of code search have shown promising results. However previous methods focus on retrieval accuracy but lacked attention to the efficiency of the retrieval process. We propose a novel method CoSHC to accelerate code search with deep hashing and code classification aiming to perform an efficient code search without sacrificing too much accuracy. To evaluate the effectiveness of CoSHC we apply our method to five code search models. Extensive experimental results indicate that compared with previous code search baselines CoSHC can save more than 9037; of retrieval time meanwhile preserving at least 9937; of retrieval accuracy. |
2022 | Supervised Deep Hashing For High-dimensional And Heterogeneous Case-based Reasoning | Zhang Qi, Hu Liang, Shi Chongyang, Liu Ke, Cao Longbing | Arxiv | Case-based Reasoning (CBR) on high-dimensional and heterogeneous data is a trending yet challenging and computationally expensive task in the real world. A promising approach is to obtain low-dimensional hash codes representing cases and perform a similarity retrieval of cases in Hamming space. However previous methods based on data-independent hashing rely on random projections or manual construction inapplicable to address specific data issues (e.g. high-dimensionality and heterogeneity) due to their insensitivity to data characteristics. To address these issues this work introduces a novel deep hashing network to learn similarity-preserving compact hash codes for efficient case retrieval and proposes a deep-hashing-enabled CBR model HeCBR. Specifically we introduce position embedding to represent heterogeneous features and utilize a multilinear interaction layer to obtain case embeddings which effectively filtrates zero-valued features to tackle high-dimensionality and sparsity and captures inter-feature couplings. Then we feed the case embeddings into fully-connected layers and subsequently a hash layer generates hash codes with a quantization regularizer to control the quantization loss during relaxation. To cater to incremental learning of CBR we further propose an adaptive learning strategy to update the hash function. Extensive experiments on public datasets show that HeCBR greatly reduces storage and significantly accelerates case retrieval. HeCBR achieves desirable performance compared with the state-of-the-art CBR methods and performs significantly better than hashing-based CBR methods in classification. |
2022 | Active Image Indexing | Fernandez Pierre, Douze Matthijs, Jégou Hervé, Furon Teddy | Arxiv | Image copy detection and retrieval from large databases leverage two components. First a neural network maps an image to a vector representation that is relatively robust to various transformations of the image. Second an efficient but approximate similarity search algorithm trades scalability (size and speed) against quality of the search thereby introducing a source of error. This paper improves the robustness of image copy detection with active indexing that optimizes the interplay of these two components. We reduce the quantization loss of a given image representation by making imperceptible changes to the image before its release. The loss is back-propagated through the deep neural network back to the image under perceptual constraints. These modifications make the image more retrievable. Our experiments show that the retrieval and copy detection of activated images is significantly improved. For instance activation improves by +4037; the Recall1@1 on various image transformations and for several popular indexing structures based on product quantization and locality sensitivity hashing. |
2022 | One Loss For Quantization Deep Hashing With Discrete Wasserstein Distributional Matching | Doan Khoa D., Yang Peng, Li Ping | Arxiv | Image hashing is a principled approximate nearest neighbor approach to find similar items to a query in a large collection of images. Hashing aims to learn a binary-output function that maps an image to a binary vector. For optimal retrieval performance producing balanced hash codes with low-quantization error to bridge the gap between the learning stages continuous relaxation and the inference stages discrete quantization is important. However in the existing deep supervised hashing methods coding balance and low-quantization error are difficult to achieve and involve several losses. We argue that this is because the existing quantization approaches in these methods are heuristically constructed and not effective to achieve these objectives. This paper considers an alternative approach to learning the quantization constraints. The task of learning balanced codes with low quantization error is re-formulated as matching the learned distribution of the continuous codes to a pre-defined discrete uniform distribution. This is equivalent to minimizing the distance between two distributions. We then propose a computationally efficient distributional distance by leveraging the discrete property of the hash functions. This distributional distance is a valid distance and enjoys lower time and sample complexities. The proposed single-loss quantization objective can be integrated into any existing supervised hashing method to improve code balance and quantization error. Experiments confirm that the proposed approach substantially improves the performance of several representative hashing~methods. |
2022 | Coophash Cooperative Learning Of Multipurpose Descriptor And Contrastive Pair Generator Via Variational MCMC Teaching For Supervised Image Hashing | Doan Khoa D., Xie Jianwen, Zhu Yaxuan, Zhao Yang, Li Ping | Arxiv | Leveraging supervised information can lead to superior retrieval performance in the image hashing domain but the performance degrades significantly without enough labeled data. One effective solution to boost performance is to employ generative models such as Generative Adversarial Networks (GANs) to generate synthetic data in an image hashing model. However GAN-based methods are difficult to train which prevents the hashing approaches from jointly training the generative models and the hash functions. This limitation results in sub-optimal retrieval performance. To overcome this limitation we propose a novel framework the generative cooperative hashing network which is based on energy-based cooperative learning. This framework jointly learns a powerful generative representation of the data and a robust hash function via two components a top-down contrastive pair generator that synthesizes contrastive images and a bottom-up multipurpose descriptor that simultaneously represents the images from multiple perspectives including probability density hash code latent code and category. The two components are jointly learned via a novel likelihood-based cooperative learning scheme. We conduct experiments on several real-world datasets and show that the proposed method outperforms the competing hashing supervised methods achieving up to 1037; relative improvement over the current state-of-the-art supervised hashing methods and exhibits a significantly better performance in out-of-distribution retrieval. |
2022 | Asymmetric Hashing For Fast Ranking Via Neural Network Measures | Doan Khoa, Tan Shulong, Zhao Weijie, Li Ping | Arxiv | Fast item ranking is an important task in recommender systems. In previous works graph-based Approximate Nearest Neighbor (ANN) approaches have demonstrated good performance on item ranking tasks with generic searching/matching measures (including complex measures such as neural network measures). However since these ANN approaches must go through the neural measures several times during ranking the computation is not practical if the neural measure is a large network. On the other hand fast item ranking using existing hashing-based approaches such as Locality Sensitive Hashing (LSH) only works with a limited set of measures. Previous learning-to-hash approaches are also not suitable to solve the fast item ranking problem since they can take a significant amount of time and computation to train the hash functions. Hashing approaches however are attractive because they provide a principle and efficient way to retrieve candidate items. In this paper we propose a simple and effective learning-to-hash approach for the fast item ranking problem that can be used for any type of measure including neural network measures. Specifically we solve this problem with an asymmetric hashing framework based on discrete inner product fitting. We learn a pair of related hash functions that map heterogeneous objects (e.g. users and items) into a common discrete space where the inner product of their binary codes reveals their true similarity defined via the original searching measure. The fast ranking problem is reduced to an ANN search via this asymmetric hashing scheme. Then we propose a sampling strategy to efficiently select relevant and contrastive samples to train the hashing model. We empirically validate the proposed method against the existing state-of-the-art fast item ranking methods in several combinations of non-linear searching functions and prominent datasets. |
2022 | Pattern Spotting And Image Retrieval In Historical Documents Using Deep Hashing | Dias Caio Da S., Britto Alceu De S. Jr., Barddal Jean P., Heutte Laurent, Koerich Alessandro L. | Arxiv | This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents. First a region proposal algorithm detects object candidates in the document page images. Next deep learning models are used for feature extraction considering two distinct variants which provide either real-valued or binary code representations. Finally candidate images are ranked by computing the feature similarity with a given input query. A robust experimental protocol evaluates the proposed approach considering each representation scheme (real-valued and binary code) on the DocExplore image database. The experimental results show that the proposed deep models compare favorably to the state-of-the-art image retrieval approaches for images of historical documents outperforming other deep models by 2.56 percentage points using the same techniques for pattern spotting. Besides the proposed approach also reduces the search time by up to 200x and the storage cost up to 6000x when compared to related works based on real-valued representations. |
2022 | Vit2hash Unsupervised Information-preserving Hashing | Gong Qinkang, Wang Liangdao, Lai Hanjiang, Pan Yan, Yin Jian | Arxiv | Unsupervised image hashing which maps images into binary codes without supervision is a compressor with a high compression rate. Hence how to preserving meaningful information of the original data is a critical problem. Inspired by the large-scale vision pre-training model known as ViT which has shown significant progress for learning visual representations in this paper we propose a simple information-preserving compressor to finetune the ViT model for the target unsupervised hashing task. Specifically from pixels to continuous features we first propose a feature-preserving module using the corrupted image as input to reconstruct the original feature from the pre-trained ViT model and the complete image so that the feature extractor can focus on preserving the meaningful information of original data. Secondly from continuous features to hash codes we propose a hashing-preserving module which aims to keep the semantic information from the pre-trained ViT model by using the proposed Kullback-Leibler divergence loss. Besides the quantization loss and the similarity loss are added to minimize the quantization error. Our method is very simple and achieves a significantly higher degree of MAP on three benchmark image datasets. |
2022 | Exploiting And Defending Against The Approximate Linearity Of Apples Neuralhash | Bhatia Jagdeep Singh, Meng Kevin | Arxiv | Perceptual hashes map images with identical semantic content to the same n-bit hash value while mapping semantically-different images to different hashes. These algorithms carry important applications in cybersecurity such as copyright infringement detection content fingerprinting and surveillance. Apples NeuralHash is one such system that aims to detect the presence of illegal content on users devices without compromising consumer privacy. We make the surprising discovery that NeuralHash is approximately linear which inspires the development of novel black-box attacks that can (i) evade detection of illegal images (ii) generate near-collisions and (iii) leak information about hashed images all without access to model parameters. These vulnerabilities pose serious threats to NeuralHashs security goals; to address them we propose a simple fix using classical cryptographic standards. |
2022 | Unsupervised Hashing With Semantic Concept Mining | Tu Rong-cheng, Mao Xian-ling, Lin Kevin Qinghong, Cai Chengfei, Qin Weize, Wang Hongfa, Wei Wei, Huang Heyan | Arxiv | Recently to improve the unsupervised image retrieval performance plenty of unsupervised hashing methods have been proposed by designing a semantic similarity matrix which is based on the similarities between image features extracted by a pre-trained CNN model. However most of these methods tend to ignore high-level abstract semantic concepts contained in images. Intuitively concepts play an important role in calculating the similarity among images. In real-world scenarios each image is associated with some concepts and the similarity between two images will be larger if they share more identical concepts. Inspired by the above intuition in this work we propose a novel Unsupervised Hashing with Semantic Concept Mining called UHSCM which leverages a VLP model to construct a high-quality similarity matrix. Specifically a set of randomly chosen concepts is first collected. Then by employing a vision-language pretraining (VLP) model with the prompt engineering which has shown strong power in visual representation learning the set of concepts is denoised according to the training images. Next the proposed method UHSCM applies the VLP model with prompting again to mine the concept distribution of each image and construct a high-quality semantic similarity matrix based on the mined concept distributions. Finally with the semantic similarity matrix as guiding information a novel hashing loss with a modified contrastive loss based regularization item is proposed to optimize the hashing network. Extensive experiments on three benchmark datasets show that the proposed method outperforms the state-of-the-art baselines in the image retrieval task. |
2022 | Deep Normalized Cross-Modal Hashing With Bi-Direction Relation Reasoning | Changchang Sun, Hugo Latapie, Gaowen Liu, Yan Yan | CVPR | Due to the continuous growth of large-scale multi-modal data and increasing requirements for retrieval speed, deep cross-modal hashing has gained increasing attention recently. Most of existing studies take a similarity matrix as supervision to optimize their models, and the inner product between continuous surrogates of hash codes is utilized to depict the similarity in the Hamming space. However, all of them merely consider the relevant information to build the similarity matrix, ignoring the contribution of the irrelevant one, i.e., the categories that samples do not belong to. Therefore, they cannot effectively alleviate the effect of dissimilar samples. Moreover, due to the modality distribution difference, directly utilizing continuous surrogates of hash codes to calculate similarity may induce suboptimal retrieval performance. To tackle these issues, in this paper, we propose a novel deep normalized cross-modal hashing scheme with bi-direction relation reasoning, named Bi_NCMH. Specifically, we build the multi-level semantic similarity matrix by considering bi-direction relation, i.e., consistent and inconsistent relation. It hence can holistically characterize relations among instances. Besides, we execute feature normalization on continuous surrogates of hash codes to eliminate the deviation caused by modality gap, which further reduces the negative impact of binarization on retrieval performance. Extensive experiments on two cross-modal benchmark datasets demonstrate the superiority of our model over several state-of-the-art baselines. |
2022 | Simultaneously Learning Robust Audio Embeddings And Balanced Hash Codes For Query-by-example | Singh Anup, Demuynck Kris, Arora Vipul | Arxiv | Audio fingerprinting systems must efficiently and robustly identify query snippets in an extensive database. To this end state-of-the-art systems use deep learning to generate compact audio fingerprints. These systems deploy indexing methods which quantize fingerprints to hash codes in an unsupervised manner to expedite the search. However these methods generate imbalanced hash codes leading to their suboptimal performance. Therefore we propose a self-supervised learning framework to compute fingerprints and balanced hash codes in an end-to-end manner to achieve both fast and accurate retrieval performance. We model hash codes as a balanced clustering process which we regard as an instance of the optimal transport problem. Experimental results indicate that the proposed approach improves retrieval efficiency while preserving high accuracy particularly at high distortion levels compared to the competing methods. Moreover our system is efficient and scalable in computational load and memory storage. |
2022 | Asymmetric Hash Code Learning For Remote Sensing Image Retrieval | Song Weiwei, Gao Zhi, Dian Renwei, Ghamisi Pedram, Zhang Yongjun, Benediktsson Jón Atli | Arxiv | Remote sensing image retrieval (RSIR) aiming at searching for a set of similar items to a given query image is a very important task in remote sensing applications. Deep hashing learning as the current mainstream method has achieved satisfactory retrieval performance. On one hand various deep neural networks are used to extract semantic features of remote sensing images. On the other hand the hashing techniques are subsequently adopted to map the high-dimensional deep features to the low-dimensional binary codes. This kind of methods attempts to learn one hash function for both the query and database samples in a symmetric way. However with the number of database samples increasing it is typically time-consuming to generate the hash codes of large-scale database images. In this paper we propose a novel deep hashing method named asymmetric hash code learning (AHCL) for RSIR. The proposed AHCL generates the hash codes of query and database images in an asymmetric way. In more detail the hash codes of query images are obtained by binarizing the output of the network while the hash codes of database images are directly learned by solving the designed objective function. In addition we combine the semantic information of each image and the similarity information of pairs of images as supervised information to train a deep hashing network which improves the representation ability of deep features and hash codes. The experimental results on three public datasets demonstrate that the proposed method outperforms symmetric methods in terms of retrieval accuracy and efficiency. The source code is available at https://github.com/weiweisong415/Demo AHCL for TGRS2022. |
2022 | Efficient Cross-modal Retrieval Via Deep Binary Hashing And Quantization | Shi Yang, Chung Young-joo | BMVC | Cross-modal retrieval aims to search for data with similar semantic meanings across different content modalities. However cross-modal retrieval requires huge amounts of storage and retrieval time since it needs to process data in multiple modalities. Existing works focused on learning single-source compact features such as binary hash codes that preserve similarities between different modalities. In this work we propose a jointly learned deep hashing and quantization network (HQ) for cross-modal retrieval. We simultaneously learn binary hash codes and quantization codes to preserve semantic information in multiple modalities by an end-to-end deep learning architecture. At the retrieval step binary hashing is used to retrieve a subset of items from the search space then quantization is used to re-rank the retrieved items. We theoretically and empirically show that this two-stage retrieval approach provides faster retrieval results while preserving accuracy. Experimental results on the NUS-WIDE MIR-Flickr and Amazon datasets demonstrate that HQ achieves boosts of more than 737; in precision compared to supervised neural network-based compact coding models. |
2022 | Deep Manifold Hashing A Divide-and-conquer Approach For Semi-paired Unsupervised Cross-modal Retrieval | Shi Yufeng, You Xinge, Xu Jiamiao, Zheng Feng, Peng Qinmu, Ou Weihua | Arxiv | Hashing that projects data into binary codes has shown extraordinary talents in cross-modal retrieval due to its low storage usage and high query speed. Despite their empirical success on some scenarios existing cross-modal hashing methods usually fail to cross modality gap when fully-paired data with plenty of labeled information is nonexistent. To circumvent this drawback motivated by the Divide-and-Conquer strategy we propose Deep Manifold Hashing (DMH) a novel method of dividing the problem of semi-paired unsupervised cross-modal retrieval into three sub-problems and building one simple yet efficiency model for each sub-problem. Specifically the first model is constructed for obtaining modality-invariant features by complementing semi-paired data based on manifold learning whereas the second model and the third model aim to learn hash codes and hash functions respectively. Extensive experiments on three benchmarks demonstrate the superiority of our DMH compared with the state-of-the-art fully-paired and semi-paired unsupervised cross-modal hashing methods. |
2022 | SEMICON A Learning-to-hash Solution For Large-scale Fine-grained Image Retrieval | Shen Yang, Sun Xuhao, Wei Xiu-shen, Jiang Qing-yuan, Yang Jian | Arxiv | In this paper we propose Suppression-Enhancing Mask based attention and Interactive Channel transformatiON (SEMICON) to learn binary hash codes for dealing with large-scale fine-grained image retrieval tasks. In SEMICON we first develop a suppression-enhancing mask (SEM) based attention to dynamically localize discriminative image regions. More importantly different from existing attention mechanism simply erasing previous discriminative regions our SEM is developed to restrain such regions and then discover other complementary regions by considering the relation between activated regions in a stage-by-stage fashion. In each stage the interactive channel transformation (ICON) module is afterwards designed to exploit correlations across channels of attended activation tensors. Since channels could generally correspond to the parts of fine-grained objects the part correlation can be also modeled accordingly which further improves fine-grained retrieval accuracy. Moreover to be computational economy ICON is realized by an efficient two-step process. Finally the hash learning of our SEMICON consists of both global- and local-level branches for better representing fine-grained objects and then generating binary hash codes explicitly corresponding to multiple levels. Experiments on five benchmark fine-grained datasets show our superiority over competing methods. |
2022 | Johnson-lindenstrauss Embeddings For Noisy Vectors -- Taking Advantage Of The Noise | Shao Zhen | Arxiv | This paper investigates theoretical properties of subsampling and hashing as tools for approximate Euclidean norm-preserving embeddings for vectors with (unknown) additive Gaussian noises. Such embeddings are sometimes called Johnson-lindenstrauss embeddings due to their celebrated lemma. Previous work shows that as sparse embeddings the success of subsampling and hashing closely depends on the l_(infty) to l_2 ratios of the vector to be mapped. This paper shows that the presence of noise removes such constrain in high-dimensions in other words sparse embeddings such as subsampling and hashing with comparable embedding dimensions to dense embeddings have similar approximate norm-preserving dimensionality-reduction properties. The key is that the noise should be treated as an information to be exploited not simply something to be removed. Theoretical bounds for subsampling and hashing to recover the approximate norm of a high dimension vector in the presence of noise are derived with numerical illustrations showing better performances are achieved in the presence of noise. |
2022 | Falconn++ A Locality-sensitive Filtering Approach For Approximate Nearest Neighbor Search | Ninh Pham, Tao Liu | Neural Information Processing Systems | We present Falconn++ a novel locality-sensitive filtering (LSF) approach for approximate nearest neighbor search on angular distance. Falconn++ can filter out potential far away points in any hash bucket before querying which results in higher quality candidates compared to other hashing-based solutions. Theoretically Falconn++ asymptotically achieves lower query time complexity than Falconn an optimal locality-sensitive hashing scheme on angular distance. Empirically Falconn++ achieves a higher recall-speed tradeoff than Falconn on many real-world data sets. Falconn++ is also competitive with HNSW an efficient representative of graph-based solutions on high search recall regimes. |
2022 | Hyp^2 Loss Beyond Hypersphere Metric Space For Multi-label Image Retrieval | Xu Chengyin, Chai Zenghao, Xu Zhengzhuo, Yuan Chun, Fan Yanbo, Wang Jue | Arxiv | Image retrieval has become an increasingly appealing technique with broad multimedia application prospects where deep hashing serves as the dominant branch towards low storage and efficient retrieval. In this paper we carried out in-depth investigations on metric learning in deep hashing for establishing a powerful metric space in multi-label scenarios where the pair loss suffers high computational overhead and converge difficulty while the proxy loss is theoretically incapable of expressing the profound label dependencies and exhibits conflicts in the constructed hypersphere space. To address the problems we propose a novel metric learning framework with Hybrid Proxy-Pair Loss (HyP^2 Loss) that constructs an expressive metric space with efficient training complexity w.r.t. the whole dataset. The proposed HyP^2 Loss focuses on optimizing the hypersphere space by learnable proxies and excavating data-to-data correlations of irrelevant pairs which integrates sufficient data correspondence of pair-based methods and high-efficiency of proxy-based methods. Extensive experiments on four standard multi-label benchmarks justify the proposed method outperforms the state-of-the-art is robust among different hash bits and achieves significant performance gains with a faster more stable convergence speed. Our code is available at https://github.com/JerryXu0129/HyP2-Loss.” |
2022 | Parameterizing Kterm Hashing | Wurzer Dominik, Qin Yumeng | SIGIR | Kterm Hashing provides an innovative approach to novelty detection on massive data streams. Previous research focused on maximizing the efficiency of Kterm Hashing and succeeded in scaling First Story Detection to Twitter-size data stream without sacrificing detection accuracy. In this paper we focus on improving the effectiveness of Kterm Hashing. Traditionally all kterms are considered as equally important when calculating a documents degree of novelty with respect to the past. We believe that certain kterms are more important than others and hypothesize that uniform kterm weights are sub-optimal for determining novelty in data streams. To validate our hypothesis we parameterize Kterm Hashing by assigning weights to kterms based on their characteristics. Our experiments apply Kterm Hashing in a First Story Detection setting and reveal that parameterized Kterm Hashing can surpass state-of-the-art detection accuracy and significantly outperform the uniformly weighted approach. |
2022 | Cross-scale Context Extracted Hashing For Fine-grained Image Binary Encoding | Xue Xuetong, Shi Jiaying, He Xinxue, Xu Shenghui, Pan Zhaoming | Arxiv | Deep hashing has been widely applied to large-scale image retrieval tasks owing to efficient computation and low storage cost by encoding high-dimensional image data into binary codes. Since binary codes do not contain as much information as float features the essence of binary encoding is preserving the main context to guarantee retrieval quality. However the existing hashing methods have great limitations on suppressing redundant background information and accurately encoding from Euclidean space to Hamming space by a simple sign function. In order to solve these problems a Cross-Scale Context Extracted Hashing Network (CSCE-Net) is proposed in this paper. Firstly we design a two-branch framework to capture fine-grained local information while maintaining high-level global semantic information. Besides Attention guided Information Extraction module (AIE) is introduced between two branches which suppresses areas of low context information cooperated with global sliding windows. Unlike previous methods our CSCE-Net learns a content-related Dynamic Sign Function (DSF) to replace the original simple sign function. Therefore the proposed CSCE-Net is context-sensitive and able to perform well on accurate image binary encoding. We further demonstrate that our CSCE-Net is superior to the existing hashing methods which improves retrieval performance on standard benchmarks. |
2022 | Learning To Hash Naturally Sorts | Yu Jiaguo, Shen Yuming, Wang Menghan, Zhang Haofeng, Torr Philip H. S. | Arxiv | Learning to hash pictures a list-wise sorting problem. Its testing metrics e.g. mean-average precision count on a sorted candidate list ordered by pair-wise code similarity. However scarcely does one train a deep hashing model with the sorted results end-to-end because of the non-differentiable nature of the sorting operation. This inconsistency in the objectives of training and test may lead to sub-optimal performance since the training loss often fails to reflect the actual retrieval metric. In this paper we tackle this problem by introducing Naturally-Sorted Hashing (NSH). We sort the Hamming distances of samples hash codes and accordingly gather their latent representations for self-supervised training. Thanks to the recent advances in differentiable sorting approximations the hash head receives gradients from the sorter so that the hash encoder can be optimized along with the training procedure. Additionally we describe a novel Sorted Noise-Contrastive Estimation (SortedNCE) loss that selectively picks positive and negative samples for contrastive learning which allows NSH to mine data semantic relations during training in an unsupervised manner. Our extensive experiments show the proposed NSH model significantly outperforms the existing unsupervised hashing methods on three benchmarked datasets. |
2022 | Hyperbolic Hierarchical Contrastive Hashing | Wei Rukai, Liu Yu, Song Jingkuan, Xie Yanzhao, Zhou Ke | Transaction on Image Processing | Hierarchical semantic structures naturally existing in real-world datasets can assist in capturing the latent distribution of data to learn robust hash codes for retrieval systems. Although hierarchical semantic structures can be simply expressed by integrating semantically relevant data into a high-level taxon with coarser-grained semantics the construction embedding and exploitation of the structures remain tricky for unsupervised hash learning. To tackle these problems we propose a novel unsupervised hashing method named Hyperbolic Hierarchical Contrastive Hashing (HHCH). We propose to embed continuous hash codes into hyperbolic space for accurate semantic expression since embedding hierarchies in hyperbolic space generates less distortion than in hyper-sphere space and Euclidean space. In addition we extend the K-Means algorithm to hyperbolic space and perform the proposed hierarchical hyperbolic K-Means algorithm to construct hierarchical semantic structures adaptively. To exploit the hierarchical semantic structures in hyperbolic space we designed the hierarchical contrastive learning algorithm including hierarchical instance-wise and hierarchical prototype-wise contrastive learning. Extensive experiments on four benchmark datasets demonstrate that the proposed method outperforms the state-of-the-art unsupervised hashing methods. Codes will be released. |
2022 | Hashencoding Autoencoding With Multiscale Coordinate Hashing | Zhornyak Lukas, Xu Zhengjie, Tang Haoran, Shi Jianbo | Arxiv | We present HashEncoding a novel autoencoding architecture that leverages a non-parametric multiscale coordinate hash function to facilitate a per-pixel decoder without convolutions. By leveraging the space-folding behaviour of hashing functions HashEncoding allows for an inherently multiscale embedding space that remains much smaller than the original image. As a result the decoder requires very few parameters compared with decoders in traditional autoencoders approaching a non-parametric reconstruction of the original image and allowing for greater generalizability. Finally by allowing backpropagation directly to the coordinate space we show that HashEncoding can be exploited for geometric tasks such as optical flow. |
2022 | Binary Representation Via Jointly Personalized Sparse Hashing | Wang Xiaoqin, Chen Chen, Lan Rushi, Liu Licheng, Liu Zhenbing, Zhou Huiyu, Luo Xiaonan | Arxiv | Unsupervised hashing has attracted much attention for binary representation learning due to the requirement of economical storage and efficiency of binary codes. It aims to encode high-dimensional features in the Hamming space with similarity preservation between instances. However most existing methods learn hash functions in manifold-based approaches. Those methods capture the local geometric structures (i.e. pairwise relationships) of data and lack satisfactory performance in dealing with real-world scenarios that produce similar features (e.g. color and shape) with different semantic information. To address this challenge in this work we propose an effective unsupervised method namely Jointly Personalized Sparse Hashing (JPSH) for binary representation learning. To be specific firstly we propose a novel personalized hashing module i.e. Personalized Sparse Hashing (PSH). Different personalized subspaces are constructed to reflect category-specific attributes for different clusters adaptively mapping instances within the same cluster to the same Hamming space. In addition we deploy sparse constraints for different personalized subspaces to select important features. We also collect the strengths of the other clusters to build the PSH module with avoiding over-fitting. Then to simultaneously preserve semantic and pairwise similarities in our JPSH we incorporate the PSH and manifold-based hash learning into the seamless formulation. As such JPSH not only distinguishes the instances from different clusters but also preserves local neighborhood structures within the cluster. Finally an alternating optimization algorithm is adopted to iteratively capture analytical solutions of the JPSH model. Extensive experiments on four benchmark datasets verify that the JPSH outperforms several hashing algorithms on the similarity search task. |
2022 | Hcfrec Hash Collaborative Filtering Via Normalized Flow With Structural Consensus For Efficient Recommendation | Wang Fan, Liu Weiming, Chen Chaochao, Zhu Mengying, Zheng Xiaolin | Arxiv | The ever-increasing data scale of user-item interactions makes it challenging for an effective and efficient recommender system. Recently hash-based collaborative filtering (Hash-CF) approaches employ efficient Hamming distance of learned binary representations of users and items to accelerate recommendations. However Hash-CF often faces two challenging problems i.e. optimization on discrete representations and preserving semantic information in learned representations. To address the above two challenges we propose HCFRec a novel Hash-CF approach for effective and efficient recommendations. Specifically HCFRec not only innovatively introduces normalized flow to learn the optimal hash code by efficiently fit a proposed approximate mixture multivariate normal distribution a continuous but approximately discrete distribution but also deploys a cluster consistency preserving mechanism to preserve the semantic structure in representations for more accurate recommendations. Extensive experiments conducted on six real-world datasets demonstrate the superiority of our HCFRec compared to the state-of-art methods in terms of effectiveness and efficiency. |
2022 | Hashing Learning With Hyper-class Representation | Zhang Shichao, Li Jiaye | Arxiv | Existing unsupervised hash learning is a kind of attribute-centered calculation. It may not accurately preserve the similarity between data. This leads to low down the performance of hash function learning. In this paper a hash algorithm is proposed with a hyper-class representation. It is a two-steps approach. The first step finds potential decision features and establish hyper-class. The second step constructs hash learning based on the hyper-class information in the first step so that the hash codes of the data within the hyper-class are as similar as possible as well as the hash codes of the data between the hyper-classes are as different as possible. To evaluate the efficiency a series of experiments are conducted on four public datasets. The experimental results show that the proposed hash algorithm is more efficient than the compared algorithms in terms of mean average precision (MAP) average precision (AP) and Hamming radius 2 (HAM2) |
2022 | Fedhap Federated Hashing With Global Prototypes For Cross-silo Retrieval | Yang Meilin, Xu Jian, Liu Yang, Ding Wenbo | Arxiv | Deep hashing has been widely applied in large-scale data retrieval due to its superior retrieval efficiency and low storage cost. However data are often scattered in data silos with privacy concerns so performing centralized data storage and retrieval is not always possible. Leveraging the concept of federated learning (FL) to perform deep hashing is a recent research trend. However existing frameworks mostly rely on the aggregation of the local deep hashing models which are trained by performing similarity learning with local skewed data only. Therefore they cannot work well for non-IID clients in a real federated environment. To overcome these challenges we propose a novel federated hashing framework that enables participating clients to jointly train the shared deep hashing model by leveraging the prototypical hash codes for each class. Globally the transmission of global prototypes with only one prototypical hash code per class will minimize the impact of communication cost and privacy risk. Locally the use of global prototypes are maximized by jointly training a discriminator network and the local hashing network. Extensive experiments on benchmark datasets are conducted to demonstrate that our method can significantly improve the performance of the deep hashing model in the federated environments with non-IID data distributions. |
2022 | Weighted Contrastive Hashing | Yu Jiaguo, Qiu Huming, Chen Dubing, Zhang Haofeng | Arxiv | The development of unsupervised hashing is advanced by the recent popular contrastive learning paradigm. However previous contrastive learning-based works have been hampered by (1) insufficient data similarity mining based on global-only image representations and (2) the hash code semantic loss caused by the data augmentation. In this paper we propose a novel method namely Weighted Contrative Hashing (WCH) to take a step towards solving these two problems. We introduce a novel mutual attention module to alleviate the problem of information asymmetry in network features caused by the missing image structure during contrative augmentation. Furthermore we explore the fine-grained semantic relations between images i.e. we divide the images into multiple patches and calculate similarities between patches. The aggregated weighted similarities which reflect the deep image relations are distilled to facilitate the hash codes learning with a distillation loss so as to obtain better retrieval performance. Extensive experiments show that the proposed WCH significantly outperforms existing unsupervised hashing methods on three benchmark datasets. |
2021 | Microsoft Turing-ANNS-1B | Herve Jegou | NeurIPS | Microsoft Turing-ANNS-1B is a new dataset being released by the Microsoft Turing team for this competition. It consists of Bing queries encoded by Turing AGI v5 that trains Transformers to capture similarity of intent in web search queries. An early version of the RNN-based AGI Encoder is described in a SIGIR’19 paper and a blogpost. |
2021 | Microsoft SPACEV-1B | Microsoft | NeurIPS | Microsoft SPACEV-1B is a new web search related dataset released by Microsoft Bing for this competition. It consists of document and query vectors encoded by Microsoft SpaceV Superior model to capture generic intent representation. |
2021 | Meta Cross-modal Hashing On Long-tailed Data | Wang Runmin, Yu Guoxian, Domeniconi Carlotta, Zhang Xiangliang | Arxiv | Due to the advantage of reducing storage while speeding up query time on big heterogeneous data cross-modal hashing has been extensively studied for approximate nearest neighbor search of multi-modal data. Most hashing methods assume that training data is class-balanced.However in practice real world data often have a long-tailed distribution. In this paper we introduce a meta-learning based cross-modal hashing method (MetaCMH) to handle long-tailed data. Due to the lack of training samples in the tail classes MetaCMH first learns direct features from data in different modalities and then introduces an associative memory module to learn the memory features of samples of the tail classes. It then combines the direct and memory features to obtain meta features for each sample. For samples of the head classes of the long tail distribution the weight of the direct features is larger because there are enough training data to learn them well; while for rare classes the weight of the memory features is larger. Finally MetaCMH uses a likelihood loss function to preserve the similarity in different modalities and learns hash functions in an end-to-end fashion. Experiments on long-tailed datasets show that MetaCMH performs significantly better than state-of-the-art methods especially on the tail classes. |
2021 | Rank-consistency Deep Hashing For Scalable Multi-label Image Search | Ma Cheng, Lu Jiwen, Zhou Jie | IEEE Transactions on Multimedia | As hashing becomes an increasingly appealing technique for large-scale image retrieval multi-label hashing is also attracting more attention for the ability to exploit multi-level semantic contents. In this paper we propose a novel deep hashing method for scalable multi-label image search. Unlike existing approaches with conventional objectives such as contrast and triplet losses we employ a rank list rather than pairs or triplets to provide sufficient global supervision information for all the samples. Specifically a new rank-consistency objective is applied to align the similarity orders from two spaces the original space and the hamming space. A powerful loss function is designed to penalize the samples whose semantic similarity and hamming distance are mismatched in two spaces. Besides a multi-label softmax cross-entropy loss is presented to enhance the discriminative power with a concise formulation of the derivative function. In order to manipulate the neighborhood structure of the samples with different labels we design a multi-label clustering loss to cluster the hashing vectors of the samples with the same labels by reducing the distances between the samples and their multiple corresponding class centers. The state-of-the-art experimental results achieved on three public multi-label datasets MIRFLICKR-25K IAPRTC12 and NUS-WIDE demonstrate the effectiveness of the proposed method. |
2021 | Deep Unsupervised Hashing By Distilled Smooth Guidance | Luo Xiao, Ma Zeyu, Wu Daqing, Zhong Huasong, Chen Chong, Ma Jinwen, Deng Minghua | ICME | Hashing has been widely used in approximate nearest neighbor search for its storage and computational efficiency. Deep supervised hashing methods are not widely used because of the lack of labeled data especially when the domain is transferred. Meanwhile unsupervised deep hashing models can hardly achieve satisfactory performance due to the lack of reliable similarity signals. To tackle this problem we propose a novel deep unsupervised hashing method namely Distilled Smooth Guidance (DSG) which can learn a distilled dataset consisting of similarity signals as well as smooth confidence signals. To be specific we obtain the similarity confidence weights based on the initial noisy similarity signals learned from local structures and construct a priority loss function for smooth similarity-preserving learning. Besides global information based on clustering is utilized to distill the image pairs by removing contradictory similarity signals. Extensive experiments on three widely used benchmark datasets show that the proposed DSG consistently outperforms the state-of-the-art search methods. |
2021 | Online Enhanced Semantic Hashing Towards Effective And Efficient Retrieval For Streaming Multi-modal Data | Wu Xiao-ming, Luo Xin, Zhan Yu-wei, Ding Chen-lu, Chen Zhen-duo, Xu Xin-shun | Arxiv | With the vigorous development of multimedia equipment and applications efficient retrieval of large-scale multi-modal data has become a trendy research topic. Thereinto hashing has become a prevalent choice due to its retrieval efficiency and low storage cost. Although multi-modal hashing has drawn lots of attention in recent years there still remain some problems. The first point is that existing methods are mainly designed in batch mode and not able to efficiently handle streaming multi-modal data. The second point is that all existing online multi-modal hashing methods fail to effectively handle unseen new classes which come continuously with streaming data chunks. In this paper we propose a new model termed Online enhAnced SemantIc haShing (OASIS). We design novel semantic-enhanced representation for data which could help handle the new coming classes and thereby construct the enhanced semantic objective function. An efficient and effective discrete online optimization algorithm is further proposed for OASIS. Extensive experiments show that our method can exceed the state-of-the-art models. For good reproducibility and benefiting the community our code and data are already available in supplementary material and will be made publicly available. |
2021 | SLOSH Set Locality Sensitive Hashing Via Sliced-wasserstein Embeddings | Lu Yuzhe, Liu Xinran, Soltoggio Andrea, Kolouri Soheil | Arxiv | Learning from set-structured data is an essential problem with many applications in machine learning and computer vision. This paper focuses on non-parametric and data-independent learning from set-structured data using approximate nearest neighbor (ANN) solutions particularly locality-sensitive hashing. We consider the problem of set retrieval from an input set query. Such retrieval problem requires 1) an efficient mechanism to calculate the distances/dissimilarities between sets and 2) an appropriate data structure for fast nearest neighbor search. To that end we propose Sliced-Wasserstein set embedding as a computationally efficient set-2-vector mechanism that enables downstream ANN with theoretical guarantees. The set elements are treated as samples from an unknown underlying distribution and the Sliced-Wasserstein distance is used to compare sets. We demonstrate the effectiveness of our algorithm denoted as Set-LOcality Sensitive Hashing (SLOSH) on various set retrieval datasets and compare our proposed embedding with standard set embedding approaches including Generalized Mean (GeM) embedding/pooling Featurewise Sort Pooling (FSPool) and Covariance Pooling and show consistent improvement in retrieval results. The code for replicating our results is available here (href)https://github.com/mint-vu/SLOSH}{https://github.com/mint-vu/SLOSH}.” |
2021 | Deep Asymmetric Hashing With Dual Semantic Regression And Class Structure Quantization | Lu Jianglin, Wang Hailing, Zhou Jie, Yan Mengfan, Wen Jiajun | Arxiv | Recently deep hashing methods have been widely used in image retrieval task. Most existing deep hashing approaches adopt one-to-one quantization to reduce information loss. However such class-unrelated quantization cannot give discriminative feedback for network training. In addition these methods only utilize single label to integrate supervision information of data for hashing function learning which may result in inferior network generalization performance and relatively low-quality hash codes since the inter-class information of data is totally ignored. In this paper we propose a dual semantic asymmetric hashing (DSAH) method which generates discriminative hash codes under three-fold constraints. Firstly DSAH utilizes class prior to conduct class structure quantization so as to transmit class information during the quantization process. Secondly a simple yet effective label mechanism is designed to characterize both the intra-class compactness and inter-class separability of data thereby achieving semantic-sensitive binary code learning. Finally a meaningful pairwise similarity preserving loss is devised to minimize the distances between class-related network outputs based on an affinity graph. With these three main components high-quality hash codes can be generated through network. Extensive experiments conducted on various datasets demonstrate the superiority of DSAH in comparison with state-of-the-art deep hashing methods. |
2021 | Ternary Hashing | Liu Chang, Fan Lixin, Ng Kam Woh, Jin Yilun, Ju Ce, Zhang Tianyu, Chan Chee Seng, Yang Qiang | Arxiv | This paper proposes a novel ternary hash encoding for learning to hash methods which provides a principled more efficient coding scheme with performances better than those of the state-of-the-art binary hashing counterparts. Two kinds of axiomatic ternary logic Kleene logic and (L)ukasiewicz logic are adopted to calculate the Ternary Hamming Distance (THD) for both the learning/encoding and testing/querying phases. Our work demonstrates that with an efficient implementation of ternary logic on standard binary machines the proposed ternary hashing is compared favorably to the binary hashing methods with consistent improvements of retrieval mean average precision (mAP) ranging from 137; to 5.937; as shown in CIFAR10 NUS-WIDE and ImageNet100 datasets. |
2021 | FDDH Fast Discriminative Discrete Hashing For Large-scale Cross-modal Retrieval | Liu Xin, Wang Xingzhi, Cheung Yiu-ming | IEEE Transactions on Neural Networks and Learning Systems | Cross-modal hashing favored for its effectiveness and efficiency has received wide attention to facilitating efficient retrieval across different modalities. Nevertheless most existing methods do not sufficiently exploit the discriminative power of semantic information when learning the hash codes while often involving time-consuming training procedure for handling the large-scale dataset. To tackle these issues we formulate the learning of similarity-preserving hash codes in terms of orthogonally rotating the semantic data so as to minimize the quantization loss of mapping such data to hamming space and propose an efficient Fast Discriminative Discrete Hashing (FDDH) approach for large-scale cross-modal retrieval. More specifically FDDH introduces an orthogonal basis to regress the targeted hash codes of training examples to their corresponding semantic labels and utilizes -dragging technique to provide provable large semantic margins. Accordingly the discriminative power of semantic information can be explicitly captured and maximized. Moreover an orthogonal transformation scheme is further proposed to map the nonlinear embedding data into the semantic subspace which can well guarantee the semantic consistency between the data feature and its semantic representation. Consequently an efficient closed form solution is derived for discriminative hash code learning which is very computationally efficient. In addition an effective and stable online learning strategy is presented for optimizing modality-specific projection functions featuring adaptivity to different training sizes and streaming data. The proposed FDDH approach theoretically approximates the bi-Lipschitz continuity runs sufficiently fast and also significantly improves the retrieval performance over the state-of-the-art methods. The source code is released at https://github.com/starxliu/FDDH.” |
2021 | Deep Self-adaptive Hashing For Image Retrieval | Lin Qinghong, Chen Xiaojun, Zhang Qin, Tian Shangxuan, Chen Yudong | Arxiv | Hashing technology has been widely used in image retrieval due to its computational and storage efficiency. Recently deep unsupervised hashing methods have attracted increasing attention due to the high cost of human annotations in the real world and the superiority of deep learning technology. However most deep unsupervised hashing methods usually pre-compute a similarity matrix to model the pairwise relationship in the pre-trained feature space. Then this similarity matrix would be used to guide hash learning in which most of the data pairs are treated equivalently. The above process is confronted with the following defects 1) The pre-computed similarity matrix is inalterable and disconnected from the hash learning process which cannot explore the underlying semantic information. 2) The informative data pairs may be buried by the large number of less-informative data pairs. To solve the aforementioned problems we propose a Deep Self-Adaptive Hashing (DSAH) model to adaptively capture the semantic information with two special designs Adaptive Neighbor Discovery (AND) and Pairwise Information Content (PIC). Firstly we adopt the AND to initially construct a neighborhood-based similarity matrix and then refine this initial similarity matrix with a novel update strategy to further investigate the semantic structure behind the learned representation. Secondly we measure the priorities of data pairs with PIC and assign adaptive weights to them which is relies on the assumption that more dissimilar data pairs contain more discriminative information for hash learning. Extensive experiments on several datasets demonstrate that the above two technologies facilitate the deep hashing model to achieve superior performance. |
2021 | Deep Center-Based Dual-Constrained Hashing for Discriminative Face Image Retrieval | Ming Zhang, Xuefei Zhe, Shifeng Chen and Hong Yan | Pattern Recognition | With the advantages of low storage cost and extremely fast retrieval speed, deep hashing methods have attracted much attention for image retrieval recently. However, large-scale face image retrieval with significant intra-class variations is still challenging. Neither existing pairwise/triplet labels-based nor softmax classification loss-based deep hashing works can generate compact and discriminative binary codes. Considering these issues, we propose a center-based framework integrating end-to-end hashing learning and class centers learning simultaneously. The framework minimizes the intra-class variance by clustering intra-class samples into a learnable class center. To strengthen inter-class separability, it additionally imposes a novel regularization term to enlarge the Hamming distance between pairwise class centers. Moreover, a simple yet effective regression matrix is introduced to encourage intra-class samples to generate the same binary codes, which further enhances the hashing codes compactness. Experiments on four large-scale datasets show the proposed method outperforms state-of-the-art baselines under various code lengths and commonly-used evaluation metrics. |
2021 | C-minhash Practically Reducing Two Permutations To Just One | Li Xiaoyun, Li Ping | Arxiv | Traditional minwise hashing (MinHash) requires applying K independent permutations to estimate the Jaccard similarity in massive binary (0/1) data where K can be (e.g.) 1024 or even larger depending on applications. The recent work on C-MinHash (Li and Li 2021) has shown with rigorous proofs that only two permutations are needed. An initial permutation is applied to break whatever structures which might exist in the data and a second permutation is re-used K times to produce K hashes via a circulant shifting fashion. (Li and Li 2021) has proved that perhaps surprisingly even though the K hashes are correlated the estimation variance is strictly smaller than the variance of the traditional MinHash. It has been demonstrated in (Li and Li 2021) that the initial permutation in C-MinHash is indeed necessary. For the ease of theoretical analysis they have used two independent permutations. In this paper we show that one can actually simply use one permutation. That is one single permutation is used for both the initial pre-processing step to break the structures in the data and the circulant hashing step to generate K hashes. Although the theoretical analysis becomes very complicated we are able to explicitly write down the expression for the expectation of the estimator. The new estimator is no longer unbiased but the bias is extremely small and has essentially no impact on the estimation accuracy (mean square errors). An extensive set of experiments are provided to verify our claim for using just one permutation. |
2021 | Deep Unsupervised Image Hashing by Maximizing Bit Entropy | Yunqiang Li, Jan van Gemert | AAAI | Unsupervised hashing is important for indexing huge image or video collections without having expensive annotations available. Hashing aims to learn short binary codes for compact storage and efficient semantic retrieval. We propose an unsupervised deep hashing layer called Bi-half Net that maximizes entropy of the binary codes. Entropy is maximal when both possible values of the bit are uniformly (half-half) distributed. To maximize bit entropy, we do not add a term to the loss function as this is difficult to optimize and tune. Instead, we design a new parameter-free network layer to explicitly force continuous image features to approximate the optimal half-half bit distribution. This layer is shown to minimize a penalized term of the Wasserstein distance between the learned continuous image features and the optimal half-half bit distribution. Experimental results on the image datasets Flickr25k, Nus-wide, Cifar-10, Mscoco, Mnist and the video datasets Ucf-101 and Hmdb-51 show that our approach leads to compact codes and compares favorably to the current state-of-the-art. |
2021 | Self-Supervised Video Hashing via Bidirectional Transformers | Shuyan Li, Xiu Li, Jiwen Lu, Jie Zhou | CVPR | Most existing unsupervised video hashing methods are built on unidirectional models with less reliable training objectives, which underuse the correlations among frames and the similarity structure between videos. To enable efficient scalable video retrieval, we propose a self-supervised video Hashing method based on Bidirectional Transformers (BTH). Based on the encoder-decoder structure of transformers, we design a visual cloze task to fully exploit the bidirectional correlations between frames. To unveil the similarity structure between unlabeled video data, we further develop a similarity reconstruction task by establishing reliable and effective similarity connections in the video space. Furthermore, we develop a cluster assignment task to exploit the structural statistics of the whole dataset such that more discriminative binary codes can be learned. Extensive experiments implemented on three public benchmark datasets, FCVID, ActivityNet and YFCC, demonstrate the superiority of our proposed approach. |
2021 | LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes | Aditya Kusupati, Matthew Wallingford, Vivek Ramanujan, Raghav Somani, Jae Sung Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi | NeurIPS | Learning binary representations of instances and classes is a classical problem with several high potential applications. In modern settings, the compression of high-dimensional neural representations to low-dimensional binary codes is a challenging task and often require large bit-codes to be accurate. In this work, we propose a novel method for Learning Low-dimensional binary Codes (LLC) for instances as well as classes. Our method does not require any side-information, like annotated attributes or label meta-data, and learns extremely low-dimensional binary codes (~20 bits for ImageNet-1K). The learnt codes are super-efficient while still ensuring nearly optimal classification accuracy for ResNet50 on ImageNet-1K. We demonstrate that the learnt codes capture intrinsically important features in the data, by discovering an intuitive taxonomy over classes. We further quantitatively measure the quality of our codes by applying it to the efficient image retrieval as well as out-of-distribution (OOD) detection problems. For ImageNet-100 retrieval problem, our learnt binary codes outperform 16 bit HashNet using only 10 bits and also are as accurate as 10 dimensional real representations. Finally, our learnt binary codes can perform OOD detection, out-of-the-box, as accurately as a baseline that needs ~3000 samples to tune its threshold, while we require none. |
2021 | High-order nonlocal Hashing for unsupervised cross-modal retrieval | Peng-Fei Zhang, Yadan Luo, Zi Huang, Xin-Shun Xu, Jingkuan Song | WWW | In light of the ability to enable efficient storage and fast query for big data, hashing techniques for cross-modal search have aroused extensive attention. Despite the great success achieved, unsupervised cross-modal hashing still suffers from lacking reliable similarity supervision and struggles with handling the heterogeneity issue between different modalities. To cope with these, in this paper, we devise a new deep hashing model, termed as High-order Nonlocal Hashing (HNH) to facilitate cross-modal retrieval with the following advantages. First, different from existing methods that mainly leverage low-level local-view similarity as the guidance for hashing learning, we propose a high-order affinity measure that considers the multi-modal neighbourhood structures from a nonlocal perspective, thereby comprehensively capturing the similarity relationships between data items. Second, a common representation is introduced to correlate different modalities. By enforcing the modal-specific descriptors and the common representation to be aligned with each other, the proposed HNH significantly bridges the modality gap and maintains the intra-consistency. Third, an effective affinity preserving objective function is delicately designed to generate high-quality binary codes. Extensive experiments evidence the superiority of the proposed HNH in unsupervised cross-modal retrieval tasks over the state-of-the-art baselines. |
2021 | HHF Hashing-guided Hinge Function For Deep Hashing Retrieval | Xu Chengyin, Chai Zenghao, Xu Zhengzhuo, Li Hongjia, Zuo Qiruyi, Yang Lingyu, Yuan Chun | Arxiv | Deep hashing has shown promising performance in large-scale image retrieval. However latent codes extracted by Deep Neural Networks (DNNs) will inevitably lose semantic information during the binarization process which damages the retrieval accuracy and makes it challenging. Although many existing approaches perform regularization to alleviate quantization errors we figure out an incompatible conflict between metric learning and quantization learning. The metric loss penalizes the inter-class distances to push different classes unconstrained far away. Worse still it tends to map the latent code deviate from ideal binarization point and generate severe ambiguity in the binarization process. Based on the minimum distance of the binary linear code we creatively propose Hashing-guided Hinge Function (HHF) to avoid such conflict. In detail the carefully-designed inflection point which relies on the hash bit length and category numbers is explicitly adopted to balance the metric term and quantization term. Such a modification prevents the network from falling into local metric optimal minima in deep hashing. Extensive experiments in CIFAR-10 CIFAR-100 ImageNet and MS-COCO show that HHF consistently outperforms existing techniques and is robust and flexible to transplant into other methods. Code is available at https://github.com/JerryXu0129/HHF.” |
2021 | Unsupervised Discrete Hashing with Affinity Similarity | Sheng Jin, Hongxun Yao, Qin Zhou, Yao Liu, Jianqiang Huang, Xiansheng Hua | TIP | In recent years, supervised hashing has been validated to greatly boost the performance of image retrieval. However, the label-hungry property requires massive label collection, making it intractable in practical scenarios. To liberate the model training procedure from laborious manual annotations, some unsupervised methods are proposed. However, the following two factors make unsupervised algorithms inferior to their supervised counterparts: (1) Without manually-defined labels, it is difficult to capture the semantic information across data, which is of crucial importance to guide robust binary code learning. (2) The widely adopted relaxation on binary constraints results in quantization error accumulation in the optimization procedure. To address the above-mentioned problems, in this paper, we propose a novel Unsupervised Discrete Hashing method (UDH). Specifically, to capture the semantic information, we propose a balanced graph-based semantic loss which explores the affinity priors in the original feature space. Then, we propose a novel self-supervised loss, termed orthogonal consistent loss, which can leverage semantic loss of instance and impose independence of codes. Moreover, by integrating the discrete optimization into the proposed unsupervised framework, the binary constraints are consistently preserved, alleviating the influence of quantization errors. Extensive experiments demonstrate that UDH outperforms state-of-the-art unsupervised methods for image retrieval. |
2021 | Self-supervised Product Quantization For Deep Unsupervised Image Retrieval | Jang Young Kyun, Cho Nam Ik | Arxiv | Supervised deep learning-based hash and vector quantization are enabling fast and large-scale image retrieval systems. By fully exploiting label annotations they are achieving outstanding retrieval performances compared to the conventional methods. However it is painstaking to assign labels precisely for a vast amount of training data and also the annotation process is error-prone. To tackle these issues we propose the first deep unsupervised image retrieval method dubbed Self-supervised Product Quantization (SPQ) network which is label-free and trained in a self-supervised manner. We design a Cross Quantized Contrastive learning strategy that jointly learns codewords and deep visual descriptors by comparing individually transformed images (views). Our method analyzes the image contents to extract descriptive features allowing us to understand image representations for accurate retrieval. By conducting extensive experiments on benchmarks we demonstrate that the proposed method yields state-of-the-art results even without supervised pretraining. |
2021 | Deep Hash Distillation For Image Retrieval | Jang Young Kyun, Gu Geonmo, Ko Byungsoo, Kang Isaac, Cho Nam Ik | Arxiv | In hash-based image retrieval systems degraded or transformed inputs usually generate different codes from the original deteriorating the retrieval accuracy. To mitigate this issue data augmentation can be applied during training. However even if augmented samples of an image are similar in real feature space the quantization can scatter them far away in Hamming space. This results in representation discrepancies that can impede training and degrade performance. In this work we propose a novel self-distilled hashing scheme to minimize the discrepancy while exploiting the potential of augmented data. By transferring the hash knowledge of the weakly-transformed samples to the strong ones we make the hash code insensitive to various transformations. We also introduce hash proxy-based similarity learning and binary cross entropy-based quantization loss to provide fine quality hash codes. Ultimately we construct a deep hashing framework that not only improves the existing deep hashing approaches but also achieves the state-of-the-art retrieval results. Extensive experiments are conducted and confirm the effectiveness of our work. |
2021 | Similarity Guided Deep Face Image Retrieval | Jang Young Kyun, Cho Nam Ik | Arxiv | Face image retrieval which searches for images of the same identity from the query input face image is drawing more attention as the size of the image database increases rapidly. In order to conduct fast and accurate retrieval a compact hash code-based methods have been proposed and recently deep face image hashing methods with supervised classification training have shown outstanding performance. However classification-based scheme has a disadvantage in that it cannot reveal complex similarities between face images into the hash code learning. In this paper we attempt to improve the face image retrieval quality by proposing a Similarity Guided Hashing (SGH) method which gently considers self and pairwise-similarity simultaneously. SGH employs various data augmentations designed to explore elaborate similarities between face images solving both intra and inter identity-wise difficulties. Extensive experimental results on the protocols with existing benchmarks and an additionally proposed large scale higher resolution face image dataset demonstrate that our SGH delivers state-of-the-art retrieval performance. |
2021 | MOON Multi-hash Codes Joint Learning For Cross-media Retrieval | Zhang Donglin, Wu Xiao-jun, Yin He-feng, Kittler Josef | Arxiv | In recent years cross-media hashing technique has attracted increasing attention for its high computation efficiency and low storage cost. However the existing approaches still have some limitations which need to be explored. 1) A fixed hash length (e.g. 16bits or 32bits) is predefined before learning the binary codes. Therefore these models need to be retrained when the hash length changes that consumes additional computation power reducing the scalability in practical applications. 2) Existing cross-modal approaches only explore the information in the original multimedia data to perform the hash learning without exploiting the semantic information contained in the learned hash codes. To this end we develop a novel Multiple hash cOdes jOint learNing method (MOON) for cross-media retrieval. Specifically the developed MOON synchronously learns the hash codes with multiple lengths in a unified framework. Besides to enhance the underlying discrimination we combine the clues from the multimodal data semantic labels and learned hash codes for hash learning. As far as we know the proposed MOON is the first work to simultaneously learn different length hash codes without retraining in cross-media retrieval. Experiments on several databases show that our MOON can achieve promising performance outperforming some recent competitive shallow and deep methods. |
2021 | One Loss For All Deep Hashing With A Single Cosine Similarity Based Learning Objective | Hoe Jiun Tian, Ng Kam Woh, Zhang Tianyu, Chan Chee Seng, Song Yi-zhe, Xiang Tao | Arxiv | A deep hashing model typically has two main learning objectives to make the learned binary hash codes discriminative and to minimize a quantization error. With further constraints such as bit balance and code orthogonality it is not uncommon for existing models to employ a large number (4) of losses. This leads to difficulties in model training and subsequently impedes their effectiveness. In this work we propose a novel deep hashing model with only a single learning objective. Specifically we show that maximizing the cosine similarity between the continuous codes and their corresponding binary orthogonal codes can ensure both hash code discriminativeness and quantization error minimization. Further with this learning objective code balancing can be achieved by simply using a Batch Normalization (BN) layer and multi-label classification is also straightforward with label smoothing. The result is an one-loss deep hashing model that removes all the hassles of tuning the weights of various losses. Importantly extensive experiments show that our model is highly effective outperforming the state-of-the-art multi-loss hashing models on three large-scale instance retrieval benchmarks often by significant margins. Code is available at https://github.com/kamwoh/orthohash” |
2021 | One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective | Jiun Tian Hoe, Kam Woh Ng, Tianyu Zhang,Chee Seng Chan,Yi-Zhe Song,Tao Xiang | NeurIPS | A deep hashing model typically has two main learning objectives: to make the learned binary hash codes discriminative and to minimize a quantization error. With further constraints such as bit balance and code orthogonality, it is not uncommon for existing models to employ a large number (>4) of losses. This leads to difficulties in model training and subsequently impedes their effectiveness. In this work, we propose a novel deep hashing model with only a single learning objective. Specifically, we show that maximizing the cosine similarity between the continuous codes and their corresponding binary orthogonal codes can ensure both hash code discriminativeness and quantization error minimization. Further, with this learning objective, code balancing can be achieved by simply using a Batch Normalization (BN) layer and multi-label classification is also straightforward with label smoothing. The result is an one-loss deep hashing model that removes all the hassles of tuning the weights of various losses. Importantly, extensive experiments show that our model is highly effective, outperforming the state-of-the-art multi-loss hashing models on three large-scale instance retrieval benchmarks, often by significant margins. |
2021 | Joint Learning Of Deep Retrieval Model And Product Quantization Based Embedding Index | Zhang Han, Shen Hongwei, Qiu Yiming, Jiang Yunjiang, Wang Songlin, Xu Sulong, Xiao Yun, Long Bo, Yang Wen-yun | Arxiv | Embedding index that enables fast approximate nearest neighbor(ANN) search serves as an indispensable component for state-of-the-art deep retrieval systems. Traditional approaches often separating the two steps of embedding learning and index building incur additional indexing time and decayed retrieval accuracy. In this paper we propose a novel method called Poeem which stands for product quantization based embedding index jointly trained with deep retrieval model to unify the two separate steps within an end-to-end training by utilizing a few techniques including the gradient straight-through estimator warm start strategy optimal space decomposition and Givens rotation. Extensive experimental results show that the proposed method not only improves retrieval accuracy significantly but also reduces the indexing time to almost none. We have open sourced our approach for the sake of comparison and reproducibility. |
2021 | Beyond Neighbourhood-Preserving Transformations for Quantization-Based Unsupervised Hashing | Sobhan Hemati, H.R. Tizhoosh | Pattern Recognition Letters | An effective unsupervised hashing algorithm leads to compact binary codes preserving the neighborhood structure of data as much as possible. One of the most established schemes for unsupervised hashing is to reduce the dimensionality of data and then find a rigid (neighbourhood-preserving) transformation that reduces the quantization error. Although employing rigid transformations is effective, we may not reduce quantization loss to the ultimate limits. As well, reducing dimensionality and quantization loss in two separate steps seems to be sub-optimal. Motivated by these shortcomings, we propose to employ both rigid and non-rigid transformations to reduce quantization error and dimensionality simultaneously. We relax the orthogonality constraint on the projection in a PCA-formulation and regularize this by a quantization term. We show that both the non-rigid projection matrix and rotation matrix contribute towards minimizing quantization loss but in different ways. A scalable nested coordinate descent approach is proposed to optimize this mixed-integer optimization problem. We evaluate the proposed method on five public benchmark datasets providing almost half a million images. Comparative results indicate that the proposed method mostly outperforms state-of-art linear methods and competes with end-to-end deep solutions. |
2021 | Beyond Neighbourhood-preserving Transformations For Quantization-based Unsupervised Hashing | Hemati Sobhan, Tizhoosh H. R. | Arxiv | An effective unsupervised hashing algorithm leads to compact binary codes preserving the neighborhood structure of data as much as possible. One of the most established schemes for unsupervised hashing is to reduce the dimensionality of data and then find a rigid (neighbourhood-preserving) transformation that reduces the quantization error. Although employing rigid transformations is effective we may not reduce quantization loss to the ultimate limits. As well reducing dimensionality and quantization loss in two separate steps seems to be sub-optimal. Motivated by these shortcomings we propose to employ both rigid and non-rigid transformations to reduce quantization error and dimensionality simultaneously. We relax the orthogonality constraint on the projection in a PCA-formulation and regularize this by a quantization term. We show that both the non-rigid projection matrix and rotation matrix contribute towards minimizing quantization loss but in different ways. A scalable nested coordinate descent approach is proposed to optimize this mixed-integer optimization problem. We evaluate the proposed method on five public benchmark datasets providing almost half a million images. Comparative results indicate that the proposed method mostly outperforms state-of-art linear methods and competes with end-to-end deep solutions. |
2021 | Representation Learning For Efficient And Effective Similarity Search And Recommendation | Hansen Casper | Arxiv | How data is represented and operationalized is critical for building computational solutions that are both effective and efficient. A common approach is to represent data objects as binary vectors denoted (textit)hash codes which require little storage and enable efficient similarity search through direct indexing into a hash table or through similarity computations in an appropriate space. Due to the limited expressibility of hash codes compared to real-valued representations a core open challenge is how to generate hash codes that well capture semantic content or latent properties using a small number of bits while ensuring that the hash codes are distributed in a way that does not reduce their search efficiency. State of the art methods use representation learning for generating such hash codes focusing on neural autoencoder architectures where semantics are encoded into the hash codes by learning to reconstruct the original inputs of the hash codes. This thesis addresses the above challenge and makes a number of contributions to representation learning that (i) improve effectiveness of hash codes through more expressive representations and a more effective similarity measure than the current state of the art namely the Hamming distance and (ii) improve efficiency of hash codes by learning representations that are especially suited to the choice of search method. The contributions are empirically validated on several tasks related to similarity search and recommendation. |
2021 | Unsupervised Multi-index Semantic Hashing | Hansen Christian, Hansen Casper, Simonsen Jakob Grue, Alstrup Stephen, Lioma Christina | Arxiv | Semantic hashing represents documents as compact binary vectors (hash codes) and allows both efficient and effective similarity search in large-scale information retrieval. The state of the art has primarily focused on learning hash codes that improve similarity search effectiveness while assuming a brute-force linear scan strategy for searching over all the hash codes even though much faster alternatives exist. One such alternative is multi-index hashing an approach that constructs a smaller candidate set to search over which depending on the distribution of the hash codes can lead to sub-linear search time. In this work we propose Multi-Index Semantic Hashing (MISH) an unsupervised hashing model that learns hash codes that are both effective and highly efficient by being optimized for multi-index hashing. We derive novel training objectives which enable to learn hash codes that reduce the candidate sets produced by multi-index hashing while being end-to-end trainable. In fact our proposed training objectives are model agnostic i.e. not tied to how the hash codes are generated specifically in MISH and are straight-forward to include in existing and future semantic hashing models. We experimentally compare MISH to state-of-the-art semantic hashing baselines in the task of document similarity search. We find that even though multi-index hashing also improves the efficiency of the baselines compared to a linear scan they are still upwards of 3337; slower than MISH while MISH is still able to obtain state-of-the-art effectiveness. |
2021 | Multi-modal Mutual Information Maximization A Novel Approach For Unsupervised Deep Cross-modal Hashing | Hoang Tuan, Do Thanh-toan, Nguyen Tam V., Cheung Ngai-man | Arxiv | In this paper we adopt the maximizing mutual information (MI) approach to tackle the problem of unsupervised learning of binary hash codes for efficient cross-modal retrieval. We proposed a novel method dubbed Cross-Modal Info-Max Hashing (CMIMH). First to learn informative representations that can preserve both intra- and inter-modal similarities we leverage the recent advances in estimating variational lower-bound of MI to maximize the MI between the binary representations and input features and between binary representations of different modalities. By jointly maximizing these MIs under the assumption that the binary representations are modelled by multivariate Bernoulli distributions we can learn binary representations which can preserve both intra- and inter-modal similarities effectively in a mini-batch manner with gradient descent. Furthermore we find out that trying to minimize the modality gap by learning similar binary representations for the same instance from different modalities could result in less informative representations. Hence balancing between reducing the modality gap and losing modality-private information is important for the cross-modal retrieval tasks. Quantitative evaluations on standard benchmark datasets demonstrate that the proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods. |
2021 | Backdoor Attack On Hash-based Image Retrieval Via Clean-label Data Poisoning | Gao Kuofeng, Bai Jiawang, Chen Bin, Wu Dongxian, Xia Shu-tao | Arxiv | A backdoored deep hashing model is expected to behave normally on original query images and return the images with the target label when a specific trigger pattern presents. To this end we propose the confusing perturbations-induced backdoor attack (CIBA). It injects a small number of poisoned images with the correct label into the training data which makes the attack hard to be detected. To craft the poisoned images we first propose the confusing perturbations to disturb the hashing code learning. As such the hashing model can learn more about the trigger. The confusing perturbations are imperceptible and generated by optimizing the intra-class dispersion and inter-class shift in the Hamming space. We then employ the targeted adversarial patch as the backdoor trigger to improve the attack performance. We have conducted extensive experiments to verify the effectiveness of our proposed CIBA. Our code is available at https://github.com/KuofengGao/CIBA.” |
2021 | Deep Triplet Hashing Network For Case-based Medical Image Retrieval | Fang Jiansheng, Fu Huazhu, Liu Jiang | Arxiv | Deep hashing methods have been shown to be the most efficient approximate nearest neighbor search techniques for large-scale image retrieval. However existing deep hashing methods have a poor small-sample ranking performance for case-based medical image retrieval. The top-ranked images in the returned query results may be as a different class than the query image. This ranking problem is caused by classification regions of interest (ROI) and small-sample information loss in the hashing space. To address the ranking problem we propose an end-to-end framework called Attention-based Triplet Hashing (ATH) network to learn low-dimensional hash codes that preserve the classification ROI and small-sample information. We embed a spatial-attention module into the network structure of our ATH to focus on ROI information. The spatial-attention module aggregates the spatial information of feature maps by utilizing max-pooling element-wise maximum and element-wise mean operations jointly along the channel axis. The triplet cross-entropy loss can help to map the classification information of images and similarity between images into the hash codes. Extensive experiments on two case-based medical datasets demonstrate that our proposed ATH can further improve the retrieval performance compared to the state-of-the-art deep hashing methods and boost the ranking performance for small samples. Compared to the other loss methods the triplet cross-entropy loss can enhance the classification performance and hash code-discriminability |
2021 | Rescuing Deep Hashing From Dead Bits Problem | Zhao Shu, Wu Dayan, Zhou Yucan, Li Bo, Wang Weiping | Arxiv | Deep hashing methods have shown great retrieval accuracy and efficiency in large-scale image retrieval. How to optimize discrete hash bits is always the focus in deep hashing methods. A common strategy in these methods is to adopt an activation function e.g. (operatornamesigmoid)((cdot)) or (operatornametanh)((cdot)) and minimize a quantization loss to approximate discrete values. However this paradigm may make more and more hash bits stuck into the wrong saturated area of the activation functions and never escaped. We call this problem Dead Bits Problem~(DBP). Besides the existing quantization loss will aggravate DBP as well. In this paper we propose a simple but effective gradient amplifier which acts before activation functions to alleviate DBP. Moreover we devise an error-aware quantization loss to further alleviate DBP. It avoids the negative effect of quantization loss based on the similarity between two images. The proposed gradient amplifier and error-aware quantization loss are compatible with a variety of deep hashing methods. Experimental results on three datasets demonstrate the efficiency of the proposed gradient amplifier and the error-aware quantization loss. |
2021 | Vision Transformer Hashing For Image Retrieval | Dubey Shiv Ram, Singh Satish Kumar, Chu Wei-ta | Arxiv | Deep learning has shown a tremendous growth in hashing techniques for image retrieval. Recently Transformer has emerged as a new architecture by utilizing self-attention without convolution. Transformer is also extended to Vision Transformer (ViT) for the visual recognition with a promising performance on ImageNet. In this paper we propose a Vision Transformer based Hashing (VTS) for image retrieval. We utilize the pre-trained ViT on ImageNet as the backbone network and add the hashing head. The proposed VTS model is fine tuned for hashing under six different image retrieval frameworks including Deep Supervised Hashing (DSH) HashNet GreedyHash Improved Deep Hashing Network (IDHN) Deep Polarized Network (DPN) and Central Similarity Quantization (CSQ) with their objective functions. We perform the extensive experiments on CIFAR10 ImageNet NUS-Wide and COCO datasets. The proposed VTS based image retrieval outperforms the recent state-of-the-art hashing techniques with a great margin. We also find the proposed VTS model as the backbone network is better than the existing networks such as AlexNet and ResNet. The code is released at (url)https://github.com/shivram1987/VisionTransformerHashing}.” |
2021 | Practical Near Neighbor Search Via Group Testing | Joshua Engels, Benjamin Coleman, Anshumali Shrivastava | Neural Information Processing Systems | We present a new algorithm for the approximate near neighbor problem that combines classical ideas from group testing with locality-sensitive hashing (LSH). We reduce the near neighbor search problem to a group testing problem by designating neighbors as positives non-neighbors as negatives and approximate membership queries as group tests. We instantiate this framework using distance-sensitive Bloom Filters to Identify Near-Neighbor Groups (FLINNG). We prove that FLINNG has sub-linear query time and show that our algorithm comes with a variety of practical advantages. For example FLINNG can be constructed in a single pass through the data consists entirely of efficient integer operations and does not require any distance computations. We conduct large-scale experiments on high-dimensional search tasks such as genome search URL similarity search and embedding search over the massive YFCC100M dataset. In our comparison with leading algorithms such as HNSW and FAISS we find that FLINNG can provide up to a 10x query speedup with substantially smaller indexing time and memory. |
2021 | Facebook SimSearchNet++ | Facebook/Meta | NeurIPS | Facebook SimSearchNet++ is a new dataset released by Facebook for this competition. It consists of features used for image copy detection for integrity purposes. The features are generated by Facebook SimSearchNet++ model. |
2021 | Binary Code Based Hash Embedding For Web-scale Applications | Yan Bencheng, Wang Pengjie, Liu Jinquan, Lin Wei, Lee Kuang-chih, Xu Jian, Zheng Bo | Arxiv | Nowadays deep learning models are widely adopted in web-scale applications such as recommender systems and online advertising. In these applications embedding learning of categorical features is crucial to the success of deep learning models. In these models a standard method is that each categorical feature value is assigned a unique embedding vector which can be learned and optimized. Although this method can well capture the characteristics of the categorical features and promise good performance it can incur a huge memory cost to store the embedding table especially for those web-scale applications. Such a huge memory cost significantly holds back the effectiveness and usability of EDRMs. In this paper we propose a binary code based hash embedding method which allows the size of the embedding table to be reduced in arbitrary scale without compromising too much performance. Experimental evaluation results show that one can still achieve 9937; performance even if the embedding table size is reduced 1000(times) smaller than the original one with our proposed method. |
2021 | Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval | Miaomiao Cheng, Liping Jing, Michael K. Ng | TOIS | With the quick development of social websites, there are more opportunities to have different media types (such as text, image, video, etc.) describing the same topic from large-scale heterogeneous data sources. To efficiently identify the inter-media correlations for multimedia retrieval, unsupervised cross-modal hashing (UCMH) has gained increased interest due to the significant reduction in computation and storage. However, most UCMH methods assume that the data from different modalities are well paired. As a result, existing UCMH methods may not achieve satisfactory performance when partially paired data are given only. In this article, we propose a new-type of UCMH method called robust unsupervised cross-modal hashing (RUCMH). The major contribution lies in jointly learning modal-specific hash function, exploring the correlations among modalities with partial or even without any pairwise correspondence, and preserving the information of original features as much as possible. The learning process can be modeled via a joint minimization problem, and the corresponding optimization algorithm is presented. A series of experiments is conducted on four real-world datasets (Wiki, MIRFlickr, NUS-WIDE, and MS-COCO). The results demonstrate that RUCMH can significantly outperform the state-of-the-art unsupervised cross-modal hashing methods, especially for the partially paired case, which validates the effectiveness of RUCMH. |
2021 | Long-Tail Hashing | Yong Chen, Yuqing Hou, Shu Leng, Ping Hu, Zhouchen Lin, and Dell Zhang | SIGIR | Hashing, which represents data items as compact binary codes, has been becoming a more and more popular technique, e.g., for large-scale image retrieval, owing to its super fast search speed as well as its extremely economical memory consumption. However, existing hashing methods all try to learn binary codes from artificially balanced datasets which are not commonly available in real-world scenarios. In this paper, we propose Long-Tail Hashing Network (LTHNet), a novel two-stage deep hashing approach that addresses the problem of learning to hash for more realistic datasets where the data labels roughly exhibit a long-tail distribution. Specifically, the first stage is to learn relaxed embeddings of the given dataset with its long-tail characteristic taken into account via an end-to-end deep neural network; the second stage is to binarize those obtained embeddings. A critical part of LTHNet is its extended dynamic meta-embedding module which can adaptively realize visual knowledge transfer between head and tail classes, and thus enrich image representations for hashing. Our experiments have shown that LTHNet achieves dramatic performance improvements over all state-of-the-art competitors on long-tail datasets, with no or little sacrifice on balanced datasets. Further analyses reveal that while to our surprise directly manipulating class weights in the loss function has little effect, the extended dynamic meta-embedding module, the usage of cross-entropy loss instead of square loss, and the relatively small batch-size for training all contribute to LTHNet’s success. |
2021 | DVHN A Deep Hashing Framework For Large-scale Vehicle Re-identification | Chen Yongbiao, Zhang Sheng, Liu Fangxin, Wu Chenggang, Guo Kaicheng, Qi Zhengwei | Arxiv | In this paper we make the very first attempt to investigate the integration of deep hash learning with vehicle re-identification. We propose a deep hash-based vehicle re-identification framework dubbed DVHN which substantially reduces memory usage and promotes retrieval efficiency while reserving nearest neighbor search accuracy. Concretely~DVHN directly learns discrete compact binary hash codes for each image by jointly optimizing the feature learning network and the hash code generating module. Specifically we directly constrain the output from the convolutional neural network to be discrete binary codes and ensure the learned binary codes are optimal for classification. To optimize the deep discrete hashing framework we further propose an alternating minimization method for learning binary similarity-preserved hashing codes. Extensive experiments on two widely-studied vehicle re-identification datasets- (textbfVehicleID) and (textbfVeRi)-~have demonstrated the superiority of our method against the state-of-the-art deep hash methods. (textbfDVHN) of 2048 bits can achieve 13.9437; and 10.2137; accuracy improvement in terms of (textbfmAP) and (textbf)Rank@1 for (textbf)VehicleID (800) dataset. For (textbfVeRi) we achieve 35.4537; and 32.7237; performance gains for (textbf)Rank@1 and (textbfmAP) respectively. |
2021 | Transhash Transformer-based Hamming Hashing For Efficient Image Retrieval | Chen Yongbiao, Zhang Sheng, Liu Fangxin, Chang Zhigang, Ye Mang, Qi Zhengwei | Arxiv | Deep hamming hashing has gained growing popularity in approximate nearest neighbour search for large-scale image retrieval. Until now the deep hashing for the image retrieval community has been dominated by convolutional neural network architectures e.g. (textttResnet)(citehe2016deep). In this paper inspired by the recent advancements of vision transformers we present (textbfTranshash) a pure transformer-based framework for deep hashing learning. Concretely our framework is composed of two major modules (1) Based on (textit)Vision Transformer (ViT) we design a siamese vision transformer backbone for image feature extraction. To learn fine-grained features we innovate a dual-stream feature learning on top of the transformer to learn discriminative global and local features. (2) Besides we adopt a Bayesian learning scheme with a dynamically constructed similarity matrix to learn compact binary hash codes. The entire framework is jointly trained in an end-to-end manner.~To the best of our knowledge this is the first work to tackle deep hashing learning problems without convolutional neural networks ((textitCNNs)). We perform comprehensive experiments on three widely-studied datasets (textbf)CIFAR-10 (textbfNUSWIDE) and (textbfIMAGENET). The experiments have evidenced our superiority against the existing state-of-the-art deep hashing methods. Specifically we achieve 8.237; 2.637; 12.737; performance gains in terms of average (textitmAP) for different hash bit lengths on three public datasets respectively. |
2021 | State Of The Art Image Hashing | Biswas Rubel, Blanco-medina Pablo | Arxiv | Perceptual image hashing methods are often applied in various objectives such as image retrieval finding duplicate or near-duplicate images and finding similar images from large-scale image content. The main challenge in image hashing techniques is robust feature extraction which generates the same or similar hashes in images that are visually identical. In this article we present a short review of the state-of-the-art traditional perceptual hashing and deep learning-based perceptual hashing methods identifying the best approaches. |
2021 | Halftimehash Modern Hashing Without 64-bit Multipliers Or Finite Fields | Apple Jim | Arxiv | HalftimeHash is a new algorithm for hashing long strings. The goals are few collisions (different inputs that produce identical output hash values) and high performance. Compared to the fastest universal hash functions on long strings (clhash and UMASH) HalftimeHash decreases collision probability while also increasing performance by over 5037; exceeding 16 bytes per cycle. In addition HalftimeHash does not use any widening 64-bit multiplications or any finite field arithmetic that could limit its portability. |
2021 | Learning To Hash Robustly Guaranteed | Andoni Alexandr, Beaglehole Daniel | Arxiv | The indexing algorithms for the high-dimensional nearest neighbor search (NNS) with the best worst-case guarantees are based on the randomized Locality Sensitive Hashing (LSH) and its derivatives. In practice many heuristic approaches exist to learn the best indexing method in order to speed-up NNS crucially adapting to the structure of the given dataset. Oftentimes these heuristics outperform the LSH-based algorithms on real datasets but almost always come at the cost of losing the guarantees of either correctness or robust performance on adversarial queries or apply to datasets with an assumed extra structure/model. In this paper we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms while optimizing the hashing to the structure of the dataset (think instance-optimal algorithms) for performance on the minimum-performing query. We evaluate the algorithms ability to optimize for a given dataset both theoretically and practically. On the theoretical side we exhibit a natural setting (dataset model) where our algorithm is much better than the standard theoretical one. On the practical side we run experiments that show that our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets. |
2021 | From Average Embeddings To Nearest Neighbor Search | Andoni Alexandr, Cheikhi David | Arxiv | In this note we show that one can use average embeddings introduced recently in Naor20 arXiv1905.01280 to obtain efficient algorithms for approximate nearest neighbor search. In particular a metric X embeds into (ell)_2 on average with distortion D if for any distribution (mu) on X the embedding is D Lipschitz and the (square of) distance does not decrease on average (wrt (mu)). In particular existence of such an embedding (assuming it is efficient) implies a O(D^3) approximate nearest neighbor search under X. This can be seen as a strengthening of the classic (bi-Lipschitz) embedding approach to nearest neighbor search and is another application of data-dependent hashing paradigm. |
2021 | Additive Feature Hashing | Andrecut M. | Arxiv | The hashing trick is a machine learning technique used to encode categorical features into a numerical vector representation of pre-defined fixed length. It works by using the categorical hash values as vector indices and updating the vector values at those indices. Here we discuss a different approach based on additive-hashing and the almost orthogonal property of high-dimensional random vectors. That is we show that additive feature hashing can be performed directly by adding the hash values and converting them into high-dimensional numerical vectors. We show that the performance of additive feature hashing is similar to the hashing trick and we illustrate the results numerically using synthetic language recognition and SMS spam detection data. |
2021 | Nearest Neighbor Search With Compact Codes A Decoder Perspective | Amara Kenza, Douze Matthijs, Sablayrolles Alexandre, Jégou Hervé | Arxiv | Modern approaches for fast retrieval of similar vectors on billion-scaled datasets rely on compressed-domain approaches such as binary sketches or product quantization. These methods minimize a certain loss typically the mean squared error or other objective functions tailored to the retrieval problem. In this paper we re-interpret popular methods such as binary hashing or product quantizers as auto-encoders and point out that they implicitly make suboptimal assumptions on the form of the decoder. We design backward-compatible decoders that improve the reconstruction of the vectors from the same codes which translates to a better performance in nearest neighbor search. Our method significantly improves over binary hashing methods or product quantization on popular benchmarks. |
2021 | Improved Deep Classwise Hashing With Centers Similarity Learning For Image Retrieval | Zhang Ming, Yan Hong | Arxiv | Deep supervised hashing for image retrieval has attracted researchers attention due to its high efficiency and superior retrieval performance. Most existing deep supervised hashing works which are based on pairwise/triplet labels suffer from the expensive computational cost and insufficient utilization of the semantics information. Recently deep classwise hashing introduced a classwise loss supervised by class labels information alternatively; however we find it still has its drawback. In this paper we propose an improved deep classwise hashing which enables hashing learning and class centers learning simultaneously. Specifically we design a two-step strategy on center similarity learning. It interacts with the classwise loss to attract the class center to concentrate on the intra-class samples while pushing other class centers as far as possible. The centers similarity learning contributes to generating more compact and discriminative hashing codes. We conduct experiments on three benchmark datasets. It shows that the proposed method effectively surpasses the original method and outperforms state-of-the-art baselines under various commonly-used evaluation metrics for image retrieval. |
2021 | Yandex DEEP-1B | Yandex | NeurIPS | Yandex DEEP-1B image descriptor dataset consisting of the projected and normalized outputs from the last fully-connected layer of the GoogLeNet model, which was pretrained on the Imagenet classification task. |
2021 | Partial 3D Object Retrieval Using Local Binary QUICCI Descriptors And Dissimilarity Tree Indexing | Van Blokland Bart Iver, Theoharis Theoharis | Arxiv | A complete pipeline is presented for accurate and efficient partial 3D object retrieval based on Quick Intersection Count Change Image (QUICCI) binary local descriptors and a novel indexing tree. It is shown how a modification to the QUICCI query descriptor makes it ideal for partial retrieval. An indexing structure called Dissimilarity Tree is proposed which can significantly accelerate searching the large space of local descriptors; this is applicable to QUICCI and other binary descriptors. The index exploits the distribution of bits within descriptors for efficient retrieval. The retrieval pipeline is tested on the artificial part of SHREC16 dataset with near-ideal retrieval results. |
2021 | One Loss For All Deep Hashing With A Single Cosine Similarity Based Learning Objective | Jiun Tian Hoe, Kam Woh Ng, Tianyu Zhang, Chee Seng Chan, Yi-zhe Song, Tao Xiang | Neural Information Processing Systems | A deep hashing model typically has two main learning objectives to make the learned binary hash codes discriminative and to minimize a quantization error. With further constraints such as bit balance and code orthogonality it is not uncommon for existing models to employ a large number (4) of losses. This leads to difficulties in model training and subsequently impedes their effectiveness. In this work we propose a novel deep hashing model with only (textit)a single learning objective. Specifically we show that maximizing the cosine similarity between the continuous codes and their corresponding (textit)binary orthogonal codes can ensure both hash code discriminativeness and quantization error minimization. Further with this learning objective code balancing can be achieved by simply using a Batch Normalization (BN) layer and multi-label classification is also straightforward with label smoothing. The result is a one-loss deep hashing model that removes all the hassles of tuning the weights of various losses. Importantly extensive experiments show that our model is highly effective outperforming the state-of-the-art multi-loss hashing models on three large-scale instance retrieval benchmarks often by significant margins. |
2021 | When Similarity Digest Meets Vector Management System A Survey On Similarity Hash Function | Tang Zhushou, Tang Lingyi, Tang Keying, Tang Ruoying | Arxiv | The booming vector manage system calls for feasible similarity hash function as a front-end to perform similarity analysis. In this paper we make a systematical survey on the existent well-known similarity hash functions to tease out the satisfied ones. We conclude that the similarity hash function MinHash and Nilsimsa can be directly marshaled into the pipeline of similarity analysis using vector manage system. After that we make a brief and empirical discussion on the performance drawbacks of the these functions and highlight MinHash the variant of SimHash and feature hashing are the best for vector management system for large-scale similarity analysis. |
2021 | Fake-image Detection With Robust Hashing | Tanaka Miki, Kiya Hitoshi | Arxiv | In this paper we investigate whether robust hashing has a possibility to robustly detect fake-images even when multiple manipulation techniques such as JPEG compression are applied to images for the first time. In an experiment the proposed fake detection with robust hashing is demonstrated to outperform state-of-the-art one under the use of various datasets including fake images generated with GANs. |
2021 | BCD A Cross-architecture Binary Comparison Database Experiment Using Locality Sensitive Hashing Algorithms | Tan Haoxi | Arxiv | Given a binary executable without source code it is difficult to determine what each function in the binary does by reverse engineering it and even harder without prior experience and context. In this paper we performed a comparison of different hashing functions effectiveness at detecting similar lifted snippets of LLVM IR code and present the design and implementation of a framework for cross-architecture binary code similarity search database using MinHash as the chosen hashing algorithm over SimHash SSDEEP and TLSH. The motivation is to help reverse engineers to quickly gain context of functions in an unknown binary by comparing it against a database of known functions. The code for this project is open source and can be found at https://github.com/h4sh5/bcddb” |
2021 | Learning To Break Deep Perceptual Hashing The Use Case Neuralhash | Struppek Lukas, Hintersdorf Dominik, Neider Daniel, Kersting Kristian | Arxiv | Apple recently revealed its deep perceptual hashing system NeuralHash to detect child sexual abuse material (CSAM) on user devices before files are uploaded to its iCloud service. Public criticism quickly arose regarding the protection of user privacy and the systems reliability. In this paper we present the first comprehensive empirical analysis of deep perceptual hashing based on NeuralHash. Specifically we show that current deep perceptual hashing may not be robust. An adversary can manipulate the hash values by applying slight changes in images either induced by gradient-based approaches or simply by performing standard image transformations forcing or preventing hash collisions. Such attacks permit malicious actors easily to exploit the detection system from hiding abusive material to framing innocent users everything is possible. Moreover using the hash values inferences can still be made about the data stored on user devices. In our view based on our results deep perceptual hashing in its current form is generally not ready for robust client-side scanning and should not be used from a privacy perspective. |
2021 | Hard Example Guided Hashing For Image Retrieval | Su Hai, Han Meiyin, Liang Junle, Liang Jun, Yu Songsen | Arxiv | Compared with the traditional hashing methods deep hashing methods generate hash codes with rich semantic information and greatly improves the performances in the image retrieval field. However it is unsatisfied for current deep hashing methods to predict the similarity of hard examples. It exists two main factors affecting the ability of learning hard examples which are weak key features extraction and the shortage of hard examples. In this paper we give a novel end-to-end model to extract the key feature from hard examples and obtain hash code with the accurate semantic information. In addition we redesign a hard pair-wise loss function to assess the hard degree and update penalty weights of examples. It effectively alleviates the shortage problem in hard examples. Experimental results on CIFAR-10 and NUS-WIDE demonstrate that our model outperformances the mainstream hashing-based image retrieval methods. |
2021 | DeSkew-LSH based Code-to-Code Recommendation Engine | Fran Silavong, Sean Moran, Antonios Georgiadis, Rohan Saphal, Robert Otter | MSR | Machine learning on source code (MLOnCode) is a popular research field that has been driven by the availability of large-scale code repositories and the development of powerful probabilistic and deep learning models for mining source code. Code-to-code recommendation is a task in MLOnCode that aims to recommend relevant, diverse and concise code snippets that usefully extend the code currently being written by a developer in their development environment (IDE). Code-to-code recommendation engines hold the promise of increasing developer productivity by reducing context switching from the IDE and increasing code-reuse. Existing code-to-code recommendation engines do not scale gracefully to large codebases, exhibiting a linear growth in query time as the code repository increases in size. In addition, existing code-to-code recommendation engines fail to account for the global statistics of code repositories in the ranking function, such as the distribution of code snippet lengths, leading to sub-optimal retrieval results. We address both of these weaknesses with \emph{Senatus}, a new code-to-code recommendation engine. At the core of Senatus is \emph{De-Skew} LSH a new locality sensitive hashing (LSH) algorithm that indexes the data for fast (sub-linear time) retrieval while also counteracting the skewness in the snippet length distribution using novel abstract syntax tree-based feature scoring and selection algorithms. We evaluate Senatus via automatic evaluation and with an expert developer user study and find the recommendations to be of higher quality than competing baselines, while achieving faster search. For example, on the CodeSearchNet dataset we show that Senatus improves performance by 6.7% F1 and query time 16x is faster compared to Facebook Aroma on the task of code-to-code recommendation. |
2021 | Re-ranking For Image Retrieval And Transductive Few-shot Classification | Xi Shen, Yang Xiao, Shell Hu, Othman Sbai, Mathieu Aubry | Neural Information Processing Systems | In the problems of image retrieval and few-shot classification the mainstream approaches focus on learning a better feature representation. However directly tackling the distance or similarity measure between images could also be efficient. To this end we revisit the idea of re-ranking the top-k retrieved images in the context of image retrieval (e.g. the k-reciprocal nearest neighbors) and generalize this idea to transductive few-shot learning. We propose to meta-learn the re-ranking updates such that the similarity graph converges towards the target similarity graph induced by the image labels. Specifically the re-ranking module takes as input an initial similarity graph between the query image and the contextual images using a pre-trained feature extractor and predicts an improved similarity graph by leveraging the structure among the involved images. We show that our re-ranking approach can be applied to unseen images and can further boost existing approaches for both image retrieval and few-shot learning problems. Our approach operates either independently or in conjunction with classical re-ranking approaches yielding clear and consistent improvements on image retrieval (CUB Cars SOP rOxford5K and rParis6K) and transductive few-shot classification (Mini-ImageNet tiered-ImageNet and CIFAR-FS) benchmarks. Our code is available at https://imagine.enpc.fr/~shenx/SSR/.” |
2021 | Yandex Text-to-Image-1B | Yandex | NeurIPS | Yandex Text-to-Image-1B is a new cross-model dataset (text and visual), where database and query vectors have different distributions in a shared representation space. The base set consists of Image embeddings produced by the Se-ResNext-101 model, and queries are textual embeddings produced by a variant of the DSSM model. Since the distributions are different, a 50M sample of the query distribution is provided. |
2021 | Unsupervised Hashing With Contrastive Information Bottleneck | Qiu Zexuan, Su Qinliang, Ou Zijing, Yu Jianxing, Chen Changyou | Arxiv | Many unsupervised hashing methods are implicitly established on the idea of reconstructing the input data which basically encourages the hashing codes to retain as much information of original data as possible. However this requirement may force the models spending lots of their effort on reconstructing the unuseful background information while ignoring to preserve the discriminative semantic information that is more important for the hashing task. To tackle this problem inspired by the recent success of contrastive learning in learning continuous representations we propose to adapt this framework to learn binary hashing codes. Specifically we first propose to modify the objective function to meet the specific requirement of hashing and then introduce a probabilistic binary representation layer into the model to facilitate end-to-end training of the entire model. We further prove the strong connection between the proposed contrastive-learning-based hashing method and the mutual information and show that the proposed model can be considered under the broader framework of the information bottleneck (IB). Under this perspective a more general hashing model is naturally obtained. Extensive experimental results on three benchmark image datasets demonstrate that the proposed hashing method significantly outperforms existing baselines. |
2021 | Contextual Similarity Aggregation With Self-attention For Visual Re-ranking | Jianbo Ouyang, Hui Wu, Min Wang, Wengang Zhou, Houqiang Li | Neural Information Processing Systems | In content-based image retrieval the first-round retrieval result by simple visual feature comparison may be unsatisfactory which can be refined by visual re-ranking techniques. In image retrieval it is observed that the contextual similarity among the top-ranked images is an important clue to distinguish the semantic relevance. Inspired by this observation in this paper we propose a visual re-ranking method by contextual similarity aggregation with self-attention. In our approach for each image in the top-K ranking list we represent it into an affinity feature vector by comparing it with a set of anchor images. Then the affinity features of the top-K images are refined by aggregating the contextual information with a transformer encoder. Finally the affinity features are used to recalculate the similarity scores between the query and the top-K images for re-ranking of the latter. To further improve the robustness of our re-ranking model and enhance the performance of our method a new data augmentation scheme is designed. Since our re-ranking model is not directly involved with the visual feature used in the initial retrieval it is ready to be applied to retrieval result lists obtained from various retrieval algorithms. We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method. |
2021 | Refining BERT Embeddings For Document Hashing Via Mutual Information Maximization | Ou Zijing, Su Qinliang, Yu Jianxing, Zhao Ruihui, Zheng Yefeng, Liu Bang | Arxiv | Existing unsupervised document hashing methods are mostly established on generative models. Due to the difficulties of capturing long dependency structures these methods rarely model the raw documents directly but instead to model the features extracted from them (e.g. bag-of-words (BOW) TFIDF). In this paper we propose to learn hash codes from BERT embeddings after observing their tremendous successes on downstream tasks. As a first try we modify existing generative hashing models to accommodate the BERT embeddings. However little improvement is observed over the codes learned from the old BOW or TFIDF features. We attribute this to the reconstruction requirement in the generative hashing which will enforce irrelevant information that is abundant in the BERT embeddings also compressed into the codes. To remedy this issue a new unsupervised hashing paradigm is further proposed based on the mutual information (MI) maximization principle. Specifically the method first constructs appropriate global and local codes from the documents and then seeks to maximize their mutual information. Experimental results on three benchmark datasets demonstrate that the proposed method is able to generate hash codes that outperform existing ones learned from BOW features by a substantial margin. |
2021 | PHPQ Pyramid Hybrid Pooling Quantization For Efficient Fine-grained Image Retrieval | Zeng Ziyun, Wang Jinpeng, Chen Bin, Dai Tao, Xia Shu-tao, Wang Zhi | Pattern Recognition Letters Volume | Deep hashing approaches including deep quantization and deep binary hashing have become a common solution to large-scale image retrieval due to their high computation and storage efficiency. Most existing hashing methods cannot produce satisfactory results for fine-grained retrieval because they usually adopt the outputs of the last CNN layer to generate binary codes. Since deeper layers tend to summarize visual clues e.g. texture into abstract semantics e.g. dogs and cats the feature produced by the last CNN layer is less effective in capturing subtle but discriminative visual details that mostly exist in shallow layers. To improve fine-grained image hashing we propose Pyramid Hybrid Pooling Quantization (PHPQ). Specifically we propose a Pyramid Hybrid Pooling (PHP) module to capture and preserve fine-grained semantic information from multi-level features which emphasizes the subtle discrimination of different sub-categories. Besides we propose a learnable quantization module with a partial codebook attention mechanism which helps to optimize the most relevant codewords and improves the quantization. Comprehensive experiments on two widely-used public benchmarks i.e. CUB-200-2011 and Stanford Dogs demonstrate that PHPQ outperforms state-of-the-art methods. |
2021 | Online Hashing With Similarity Learning | Weng Zhenyu, Zhu Yuesheng | Arxiv | Online hashing methods usually learn the hash functions online aiming to efficiently adapt to the data variations in the streaming environment. However when the hash functions are updated the binary codes for the whole database have to be updated to be consistent with the hash functions resulting in the inefficiency in the online image retrieval process. In this paper we propose a novel online hashing framework without updating binary codes. In the proposed framework the hash functions are fixed and a parametric similarity function for the binary codes is learnt online to adapt to the streaming data. Specifically a parametric similarity function that has a bilinear form is adopted and a metric learning algorithm is proposed to learn the similarity function online based on the characteristics of the hashing methods. The experiments on two multi-label image datasets show that our method is competitive or outperforms the state-of-the-art online hashing methods in terms of both accuracy and efficiency for multi-label image retrieval. |
2021 | A^2-net Learning Attribute-aware Hash Codes For Large-scale Fine-grained Image Retrieval | Xiu-shen Wei, Yang Shen, Xuhao Sun, Han-jia Ye, Jian Yang | Neural Information Processing Systems | Our work focuses on tackling large-scale fine-grained image retrieval as ranking the images depicting the concept of interests (i.e. the same sub-category labels) highest based on the fine-grained details in the query. It is desirable to alleviate the challenges of both fine-grained nature of small inter-class variations with large intra-class variations and explosive growth of fine-grained data for such a practical task. In this paper we propose an Attribute-Aware hashing Network (A^2-Net) for generating attribute-aware hash codes to not only make the retrieval process efficient but also establish explicit correspondences between hash codes and visual attributes. Specifically based on the captured visual representations by attention we develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors from the appearance-specific visual representations without attribute annotations. A^2-Net is also equipped with a feature decorrelation constraint upon these attribute vectors to enhance their representation abilities. Finally the required hash codes are generated by the attribute vectors driven by preserving original similarities. Qualitative experiments on five benchmark fine-grained datasets show our superiority over competing methods. More importantly quantitative results demonstrate the obtained hash codes can strongly correspond to certain kinds of crucial properties of fine-grained objects. |
2021 | Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing | Jun Yu, Hao Zhou, Yibing Zhan, Dacheng Tao | AAAI | Unsupervised cross-modal hashing (UCMH) has become a hot topic recently. Current UCMH focuses on exploring data similarities. However, current UCMH methods calculate the similarity between two data, mainly relying on the two data’s cross-modal features. These methods suffer from inaccurate similarity problems that result in a suboptimal retrieval Hamming space, because the cross-modal features between the data are not sufficient to describe the complex data relationships, such as situations where two data have different feature representations but share the inherent concepts. In this paper, we devise a deep graph-neighbor coherence preserving network (DGCPN). Specifically, DGCPN stems from graph models and explores graph-neighbor coherence by consolidating the information between data and their neighbors. DGCPN regulates comprehensive similarity preserving losses by exploiting three types of data similarities (i.e., the graph-neighbor coherence, the coexistent similarity, and the intra- and inter-modality consistency) and designs a half-real and half-binary optimization strategy to reduce the quantization errors during hashing. Essentially, DGCPN addresses the inaccurate similarity problem by exploring and exploiting the data’s intrinsic relationships in a graph. We conduct extensive experiments on three public UCMH datasets. The experimental results demonstrate the superiority of DGCPN, e.g., by improving the mean average precision from 0.722 to 0.751 on MIRFlickr-25K using 64-bit hashing codes to retrieval texts from images. We will release the source code package and the trained model on https://github.com/Atmegal/DGCPN. |
2021 | A-Net: Learning Attribute-Aware Hash Codes for Large-Scale Fine-Grained Image Retrieval | Xiu-Shen Wei, Xiu-Shen_Wei,Yang Shen, Xuhao Sun, Han-Jia Ye, Jian Yang | NeurIPS | Our work focuses on tackling large-scale fine-grained image retrieval as ranking the images depicting the concept of interests (i.e., the same sub-category labels) highest based on the fine-grained details in the query. It is desirable to alleviate the challenges of both fine-grained nature of small inter-class variations with large intra-class variations and explosive growth of fine-grained data for such a practical task. In this paper, we propose an Attribute-Aware hashing Network (A-Net) for generating attribute-aware hash codes to not only make the retrieval process efficient, but also establish explicit correspondences between hash codes and visual attributes. Specifically, based on the captured visual representations by attention, we develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors from the appearance-specific visual representations without attribute annotations. A-Net is also equipped with a feature decorrelation constraint upon these attribute vectors to enhance their representation abilities. Finally, the required hash codes are generated by the attribute vectors driven by preserving original similarities. Qualitative experiments on five benchmark fine-grained datasets show our superiority over competing methods. More importantly, quantitative results demonstrate the obtained hash codes can strongly correspond to certain kinds of crucial properties of fine-grained objects. |
2021 | Instance-weighted Central Similarity For Multi-label Image Retrieval | Zhang Zhiwei, Peng Hanyu | Arxiv | Deep hashing has been widely applied to large-scale image retrieval by encoding high-dimensional data points into binary codes for efficient retrieval. Compared with pairwise/triplet similarity based hash learning central similarity based hashing can more efficiently capture the global data distribution. For multi-label image retrieval however previous methods only use multiple hash centers with equal weights to generate one centroid as the learning target which ignores the relationship between the weights of hash centers and the proportion of instance regions in the image. To address the above issue we propose a two-step alternative optimization approach Instance-weighted Central Similarity (ICS) to automatically learn the center weight corresponding to a hash code. Firstly we apply the maximum entropy regularizer to prevent one hash center from dominating the loss function and compute the center weights via projection gradient descent. Secondly we update neural network parameters by standard back-propagation with fixed center weights. More importantly the learned center weights can well reflect the proportion of foreground instances in the image. Our method achieves the state-of-the-art performance on the image retrieval benchmarks and especially improves the mAP by 1.637;-6.437; on the MS COCO dataset. |
2021 | Prototype-supervised Adversarial Network For Targeted Attack Of Deep Hashing | Wang Xunguang, Zhang Zheng, Wu Baoyuan, Shen Fumin, Lu Guangming | Arxiv | Due to its powerful capability of representation learning and high-efficiency computation deep hashing has made significant progress in large-scale image retrieval. However deep hashing networks are vulnerable to adversarial examples which is a practical secure problem but seldom studied in hashing-based retrieval field. In this paper we propose a novel prototype-supervised adversarial network (ProS-GAN) which formulates a flexible generative architecture for efficient and effective targeted hashing attack. To the best of our knowledge this is the first generation-based method to attack deep hashing networks. Generally our proposed framework consists of three parts i.e. a PrototypeNet a generator and a discriminator. Specifically the designed PrototypeNet embeds the target label into the semantic representation and learns the prototype code as the category-level representative of the target label. Moreover the semantic representation and the original image are jointly fed into the generator for a flexible targeted attack. Particularly the prototype code is adopted to supervise the generator to construct the targeted adversarial example by minimizing the Hamming distance between the hash code of the adversarial example and the prototype code. Furthermore the generator is against the discriminator to simultaneously encourage the adversarial examples visually realistic and the semantic representation informative. Extensive experiments verify that the proposed framework can efficiently produce adversarial examples with better targeted attack performance and transferability over state-of-the-art targeted attack methods of deep hashing. The related codes could be available at https://github.com/xunguangwang/ProS-GAN . |
2021 | Cross-modal Zero-shot Hashing By Label Attributes Embedding | Wang Runmin, Yu Guoxian, Liu Lei, Cui Lizhen, Domeniconi Carlotta, Zhang Xiangliang | Arxiv | Cross-modal hashing (CMH) is one of the most promising methods in cross-modal approximate nearest neighbor search. Most CMH solutions ideally assume the labels of training and testing set are identical. However the assumption is often violated causing a zero-shot CMH problem. Recent efforts to address this issue focus on transferring knowledge from the seen classes to the unseen ones using label attributes. However the attributes are isolated from the features of multi-modal data. To reduce the information gap we introduce an approach called LAEH (Label Attributes Embedding for zero-shot cross-modal Hashing). LAEH first gets the initial semantic attribute vectors of labels by word2vec model and then uses a transformation network to transform them into a common subspace. Next it leverages the hash vectors and the feature similarity matrix to guide the feature extraction network of different modalities. At the same time LAEH uses the attribute similarity as the supplement of label similarity to rectify the label embedding and common subspace. Experiments show that LAEH outperforms related representative zero-shot and cross-modal hashing methods. |
2021 | Contrastive Quantization With Code Memory For Unsupervised Image Retrieval | Wang Jinpeng, Zeng Ziyun, Chen Bin, Dai Tao, Xia Shu-tao | Arxiv | The high efficiency in computation and storage makes hashing (including binary hashing and quantization) a common strategy in large-scale retrieval systems. To alleviate the reliance on expensive annotations unsupervised deep hashing becomes an important research problem. This paper provides a novel solution to unsupervised deep quantization namely Contrastive Quantization with Code Memory (MeCoQ). Different from existing reconstruction-based strategies we learn unsupervised binary descriptors by contrastive learning which can better capture discriminative visual semantics. Besides we uncover that codeword diversity regularization is critical to prevent contrastive learning-based quantization from model degeneration. Moreover we introduce a novel quantization code memory module that boosts contrastive learning with lower feature drift than conventional feature memories. Extensive experiments on benchmark datasets show that MeCoQ outperforms state-of-the-art methods. Code and configurations are publicly available at https://github.com/gimpong/AAAI22-MeCoQ.” |
2021 | Mathematical Models For Local Sensing Hashes | Wang Li, Wangner Lilon | Arxiv | As data volumes continue to grow searches in data are becoming increasingly time-consuming. Classical index structures for neighbor search are no longer sustainable due to the curse of dimensionality. Instead approximated index structures offer a good opportunity to significantly accelerate the neighbor search for clustering and outlier detection and to have the lowest possible error rate in the results of the algorithms. Local sensing hashes is one of those. We indicate directions to mathematically model the properties of it. |
2020 | A Novel Incremental Cross-modal Hashing Approach | Mandal Devraj, Biswas Soma | Arxiv | Cross-modal retrieval deals with retrieving relevant items from one modality when provided with a search query from another modality. Hashing techniques where the data is represented as binary bits have specifically gained importance due to the ease of storage fast computations and high accuracy. In real world the number of data categories is continuously increasing which requires algorithms capable of handling this dynamic scenario. In this work we propose a novel incremental cross-modal hashing algorithm termed iCMH which can adapt itself to handle incoming data of new categories. The proposed approach consists of two sequential stages namely learning the hash codes and training the hash functions. At every stage a small amount of old category data termed exemplars is is used so as not to forget the old data while trying to learn for the new incoming data i.e. to avoid catastrophic forgetting. In the first stage the hash codes for the exemplars is used and simultaneously hash codes for the new data is computed such that it maintains the semantic relations with the existing data. For the second stage we propose both a non-deep and deep architectures to learn the hash functions effectively. Extensive experiments across a variety of cross-modal datasets and comparisons with state-of-the-art cross-modal algorithms shows the usefulness of our approach. |
2020 | A Survey On Deep Hashing Methods | Luo Xiao, Wang Haixin, Wu Daqing, Chen Chong, Deng Minghua, Huang Jianqiang, Hua Xian-sheng | Arxiv | Nearest neighbor search aims to obtain the samples in the database with the smallest distances from them to the queries which is a basic task in a range of fields including computer vision and data mining. Hashing is one of the most widely used methods for its computational and storage efficiency. With the development of deep learning deep hashing methods show more advantages than traditional methods. In this survey we detailedly investigate current deep hashing algorithms including deep supervised hashing and deep unsupervised hashing. Specifically we categorize deep supervised hashing methods into pairwise methods ranking-based methods pointwise methods as well as quantization according to how measuring the similarities of the learned hash codes. Moreover deep unsupervised hashing is categorized into similarity reconstruction-based methods pseudo-label-based methods and prediction-free self-supervised learning-based methods based on their semantic learning manners. We also introduce three related important topics including semi-supervised deep hashing domain adaption deep hashing and multi-modal deep hashing. Meanwhile we present some commonly used public datasets and the scheme to measure the performance of deep hashing algorithms. Finally we discuss some potential research directions in conclusion. |
2020 | Asymmetric Correlation Quantization Hashing For Cross-modal Retrieval | Wang Lu, Yang Jie | Arxiv | Due to the superiority in similarity computation and database storage for large-scale multiple modalities data cross-modal hashing methods have attracted extensive attention in similarity retrieval across the heterogeneous modalities. However there are still some limitations to be further taken into account (1) most current CMH methods transform real-valued data points into discrete compact binary codes under the binary constraints limiting the capability of representation for original data on account of abundant loss of information and producing suboptimal hash codes; (2) the discrete binary constraint learning model is hard to solve where the retrieval performance may greatly reduce by relaxing the binary constraints for large quantization error; (3) handling the learning problem of CMH in a symmetric framework leading to difficult and complex optimization objective. To address above challenges in this paper a novel Asymmetric Correlation Quantization Hashing (ACQH) method is proposed. Specifically ACQH learns the projection matrixs of heterogeneous modalities data points for transforming query into a low-dimensional real-valued vector in latent semantic space and constructs the stacked compositional quantization embedding in a coarse-to-fine manner for indicating database points by a series of learnt real-valued codeword in the codebook with the help of pointwise label information regression simultaneously. Besides the unified hash codes across modalities can be directly obtained by the discrete iterative optimization framework devised in the paper. Comprehensive experiments on diverse three benchmark datasets have shown the effectiveness and rationality of ACQH. |
2020 | Deep Cross-modal Hashing Via Margin-dynamic-softmax Loss | Tu Rong-cheng, Mao Xian-ling, Tu Rongxin, Bian Binbin, Wei Wei, Huang Heyan | Arxiv | Due to their high retrieval efficiency and low storage cost for cross-modal search task cross-modal hashing methods have attracted considerable attention. For the supervised cross-modal hashing methods how to make the learned hash codes preserve semantic information sufficiently contained in the label of datapoints is the key to further enhance the retrieval performance. Hence almost all supervised cross-modal hashing methods usually depends on defining a similarity between datapoints with the label information to guide the hashing model learning fully or partly. However the defined similarity between datapoints can only capture the label information of datapoints partially and misses abundant semantic information then hinders the further improvement of retrieval performance. Thus in this paper different from previous works we propose a novel cross-modal hashing method without defining the similarity between datapoints called Deep Cross-modal Hashing via (textit)Margin-dynamic-softmax Loss (DCHML). Specifically DCHML first trains a proxy hashing network to transform each category information of a dataset into a semantic discriminative hash code called proxy hash code. Each proxy hash code can preserve the semantic information of its corresponding category well. Next without defining the similarity between datapoints to supervise the training process of the modality-specific hashing networks we propose a novel (textit)margin-dynamic-softmax loss to directly utilize the proxy hashing codes as supervised information. Finally by minimizing the novel (textit)margin-dynamic-softmax loss the modality-specific hashing networks can be trained to generate hash codes which can simultaneously preserve the cross-modal similarity and abundant semantic information well. |
2020 | Deep Hashing With Hash-consistent Large Margin Proxy Embeddings | Morgado Pedro, Li Yunsheng, Pereira Jose Costa, Saberian Mohammad, Vasconcelos Nuno | Arxiv | Image hash codes are produced by binarizing the embeddings of convolutional neural networks (CNN) trained for either classification or retrieval. While proxy embeddings achieve good performance on both tasks they are non-trivial to binarize due to a rotational ambiguity that encourages non-binary embeddings. The use of a fixed set of proxies (weights of the CNN classification layer) is proposed to eliminate this ambiguity and a procedure to design proxy sets that are nearly optimal for both classification and hashing is introduced. The resulting hash-consistent large margin (HCLM) proxies are shown to encourage saturation of hashing units thus guaranteeing a small binarization error while producing highly discriminative hash-codes. A semantic extension (sHCLM) aimed to improve hashing performance in a transfer scenario is also proposed. Extensive experiments show that sHCLM embeddings achieve significant improvements over state-of-the-art hashing procedures on several small and large datasets both within and beyond the set of training classes. |
2020 | Label Self-Adaption Hashing for Image Retrieval | Jianglin Lu, Zhihui Lai, Hailing Wang, Jie Zhou | ICPR | Hashing has attracted widespread attention in image retrieval because of its fast retrieval speed and low storage cost. Compared with supervised methods, unsupervised hashing methods are more reasonable and suitable for large-scale image retrieval since it is always difficult and expensive to collect true labels of the massive data. Without label information, however, unsupervised hashing methods can not guarantee the quality of learned binary codes. To resolve this dilemma, this paper proposes a novel unsupervised hashing method called Label Self-Adaption Hashing (LSAH), which contains effective hashing function learning part and self-adaption label generation part. In the first part, we utilize anchor graph to keep the local structure of the data and introduce joint sparsity into the model to extract effective features for high-quality binary code learning. In the second part, a self-adaptive cluster label matrix is learned from the data under the assumption that the nearest neighbor points should have a large probability to be in the same cluster. Therefore, the proposed LSAH can make full use of the potential discriminative information of the data to guide the learning of binary code. It is worth noting that LSAH can learn effective binary codes, hashing function and cluster labels simultaneously in a unified optimization framework. To solve the resulting optimization problem, an Augmented Lagrange Multiplier based iterative algorithm is elaborately designed. Extensive experiments on three large-scale data sets indicate the promising performance of the proposed LSAH. |
2020 | Deep Multi-view Enhancement Hashing For Image Retrieval | Yan Chenggang, Gong Biao, Wei Yuxuan, Gao Yue | Arxiv | Hashing is an efficient method for nearest neighbor search in large-scale data space by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. However large-scale high-speed retrieval through binary code has a certain degree of reduction in retrieval accuracy compared to traditional retrieval methods. We have noticed that multi-view methods can well preserve the diverse characteristics of data. Therefore we try to introduce the multi-view deep neural network into the hash learning field and design an efficient and innovative retrieval model which has achieved a significant improvement in retrieval performance. In this paper we propose a supervised multi-view hash model which can enhance the multi-view information through neural networks. This is a completely new hash learning method that combines multi-view and deep learning methods. The proposed method utilizes an effective view stability evaluation method to actively explore the relationship among views which will affect the optimization direction of the entire network. We have also designed a variety of multi-data fusion methods in the Hamming space to preserve the advantages of both convolution and multi-view. In order to avoid excessive computing resources on the enhancement procedure during retrieval we set up a separate structure called memory network which participates in training together. The proposed method is systematically evaluated on the CIFAR-10 NUS-WIDE and MS-COCO datasets and the results show that our method significantly outperforms the state-of-the-art single-view and multi-view hashing methods. |
2020 | Weakly-supervised Online Hashing | Zhan Yu-wei, Luo Xin, Sun Yu, Wang Yongxin, Chen Zhen-duo, Xu Xin-shun | Arxiv | With the rapid development of social websites recent years have witnessed an explosive growth of social images with user-provided tags which continuously arrive in a streaming fashion. Due to the fast query speed and low storage cost hashing-based methods for image search have attracted increasing attention. However existing hashing methods for social image retrieval are based on batch mode which violates the nature of social images i.e. social images are usually generated periodically or collected in a stream fashion. Although there exist many online image hashing methods they either adopt unsupervised learning which ignore the relevant tags or are designed in the supervised manner which needs high-quality labels. In this paper to overcome the above limitations we propose a new method named Weakly-supervised Online Hashing (WOH). In order to learn high-quality hash codes WOH exploits the weak supervision by considering the semantics of tags and removing the noise. Besides We develop a discrete online optimization algorithm for WOH which is efficient and scalable. Extensive experiments conducted on two real-world datasets demonstrate the superiority of WOH compared with several state-of-the-art hashing baselines. |
2020 | Dual-level Semantic Transfer Deep Hashing For Efficient Social Image Retrieval | Zhu Lei, Cui Hui, Cheng Zhiyong, Li Jingjing, Zhang Zheng | Arxiv | Social network stores and disseminates a tremendous amount of user shared images. Deep hashing is an efficient indexing technique to support large-scale social image retrieval due to its deep representation capability fast retrieval speed and low storage cost. Particularly unsupervised deep hashing has well scalability as it does not require any manually labelled data for training. However owing to the lacking of label guidance existing methods suffer from severe semantic shortage when optimizing a large amount of deep neural network parameters. Differently in this paper we propose a Dual-level Semantic Transfer Deep Hashing (DSTDH) method to alleviate this problem with a unified deep hash learning framework. Our model targets at learning the semantically enhanced deep hash codes by specially exploiting the user-generated tags associated with the social images. Specifically we design a complementary dual-level semantic transfer mechanism to efficiently discover the potential semantics of tags and seamlessly transfer them into binary hash codes. On the one hand instance-level semantics are directly preserved into hash codes from the associated tags with adverse noise removing. Besides an image-concept hypergraph is constructed for indirectly transferring the latent high-order semantic correlations of images and tags into hash codes. Moreover the hash codes are obtained simultaneously with the deep representation learning by the discrete hash optimization strategy. Extensive experiments on two public social image retrieval datasets validate the superior performance of our method compared with state-of-the-art hashing methods. The source codes of our method can be obtained at https://github.com/research2020-1/DSTDH” |
2020 | CIMON Towards High-quality Hash Codes | Luo Xiao, Wu Daqing, Ma Zeyu, Chen Chong, Deng Minghua, Ma Jinwen, Jin Zhongming, Huang Jianqiang, Hua Xian-sheng | Arxiv | Recently hashing is widely used in approximate nearest neighbor search for its storage and computational efficiency. Most of the unsupervised hashing methods learn to map images into semantic similarity-preserving hash codes by constructing local semantic similarity structure from the pre-trained model as the guiding information i.e. treating each point pair similar if their distance is small in feature space. However due to the inefficient representation ability of the pre-trained model many false positives and negatives in local semantic similarity will be introduced and lead to error propagation during the hash code learning. Moreover few of the methods consider the robustness of models which will cause instability of hash codes to disturbance. In this paper we propose a new method named (textbfC)omprehensive s(textbfI)milarity (textbfM)ining and c(textbfO)nsistency lear(textbfN)ing (CIMON). First we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes. Extensive experiments on several benchmark datasets show that the proposed method outperforms a wide range of state-of-the-art methods in both retrieval performance and robustness. |
2020 | Model Optimization Boosting Framework for Linear Model Hash Learning | Xingbo Liu, Xiushan Nie, Quan Zhou, Liqiang Nie, Yilong Yin | TIP | Efficient hashing techniques have attracted extensive research interests in both storage and retrieval of high dimensional data, such as images and videos. In existing hashing methods, a linear model is commonly utilized owing to its efficiency. To obtain better accuracy, linear-based hashing methods focus on designing a generalized linear objective function with different constraints or penalty terms that consider the inherent characteristics and neighborhood information of samples. Differing from existing hashing methods, in this study, we propose a self-improvement framework called Model Boost (MoBoost) to improve model parameter optimization for linear-based hashing methods without adding new constraints or penalty terms. In the proposed MoBoost, for a linear-based hashing method, we first repeatedly execute the hashing method to obtain several hash codes to training samples. Then, utilizing two novel fusion strategies, these codes are fused into a single set. We also propose two new criteria to evaluate the goodness of hash bits during the fusion process. Based on the fused set of hash codes, we learn new parameters for the linear hash function that can significantly improve the accuracy. In general, the proposed MoBoost can be adopted by existing linear-based hashing methods, achieving more precise and stable performance compared to the original methods, and adopting the proposed MoBoost will incur negligible time and space costs. To evaluate the proposed MoBoost, we performed extensive experiments on four benchmark datasets, and the results demonstrate superior performance. |
2020 | Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval | Song Liu, Shengsheng Qian, Yang Guan, Jiawei Zhan, Long Ying | SIGIR | Hashing-based cross-modal search which aims to map multiple modality features into binary codes has attracted increasingly attention due to its storage and search efficiency especially in large-scale database retrieval. Recent unsupervised deep cross-modal hashing methods have shown promising results. However, existing approaches typically suffer from two limitations: (1) They usually learn cross-modal similarity information separately or in a redundant fusion manner, which may fail to capture semantic correlations among instances from different modalities sufficiently and effectively. (2) They seldom consider the sampling and weighting schemes for unsupervised cross-modal hashing, resulting in the lack of satisfactory discriminative ability in hash codes. To overcome these limitations, we propose a novel unsupervised deep cross-modal hashing method called Joint-modal Distribution-based Similarity Hashing (JDSH) for large-scale cross-modal retrieval. Firstly, we propose a novel cross-modal joint-training method by constructing a joint-modal similarity matrix to fully preserve the cross-modal semantic correlations among instances. Secondly, we propose a sampling and weighting scheme termed the Distribution-based Similarity Decision and Weighting (DSDW) method for unsupervised cross-modal hashing, which is able to generate more discriminative hash codes by pushing semantic similar instance pairs closer and pulling semantic dissimilar instance pairs apart. The experimental results demonstrate the superiority of JDSH compared with several unsupervised cross-modal hashing methods on two public datasets NUS-WIDE and MIRFlickr. |
2020 | Reinforcing Short-length Hashing | Liu Xingbo, Nie Xiushan, Dai Qi, Huang Yupan, Yin Yilong | Arxiv | Due to the compelling efficiency in retrieval and storage similarity-preserving hashing has been widely applied to approximate nearest neighbor search in large-scale image retrieval. However existing methods have poor performance in retrieval using an extremely short-length hash code due to weak ability of classification and poor distribution of hash bit. To address this issue in this study we propose a novel reinforcing short-length hashing (RSLH). In this proposed RSLH mutual reconstruction between the hash representation and semantic labels is performed to preserve the semantic information. Furthermore to enhance the accuracy of hash representation a pairwise similarity matrix is designed to make a balance between accuracy and training expenditure on memory. In addition a parameter boosting strategy is integrated to reinforce the precision with hash bits fusion. Extensive experiments on three large-scale image benchmarks demonstrate the superior performance of RSLH under various short-length hashing scenarios. |
2020 | Self-supervised Asymmetric Deep Hashing With Margin-scalable Constraint | Yu Zhengyang, Wu Song, Dou Zhihao, Bakker Erwin M. | Arxiv | Due to its effectivity and efficiency deep hashing approaches are widely used for large-scale visual search. However it is still challenging to produce compact and discriminative hash codes for images associated with multiple semantics for two main reasons 1) similarity constraints designed in most of the existing methods are based upon an oversimplified similarity assignment(i.e. 0 for instance pairs sharing no label 1 for instance pairs sharing at least 1 label) 2) the exploration in multi-semantic relevance are insufficient or even neglected in many of the existing methods. These problems significantly limit the discrimination of generated hash codes. In this paper we propose a novel self-supervised asymmetric deep hashing method with a margin-scalable constraint(SADH) approach to cope with these problems. SADH implements a self-supervised network to sufficiently preserve semantic information in a semantic feature dictionary and a semantic code dictionary for the semantics of the given dataset which efficiently and precisely guides a feature learning network to preserve multilabel semantic information using an asymmetric learning strategy. By further exploiting semantic dictionaries a new margin-scalable constraint is employed for both precise similarity searching and robust hash code generation. Extensive empirical research on four popular benchmarks validates the proposed method and shows it outperforms several state-of-the-art approaches. |
2020 | Nonlinear Robust Discrete Hashing for Cross-Modal Retrieval | Zhan Yang, Jun Long, Lei Zhu, Wenti Huang | SIGIR | Hashing techniques have recently been successfully applied to solve similarity search problems in the information retrieval field because of their significantly reduced storage and high-speed search capabilities. However, the hash codes learned from most recent cross-modal hashing methods lack the ability to comprehensively preserve adequate information, resulting in a less than desirable performance. To solve this limitation, we propose a novel method termed Nonlinear Robust Discrete Hashing (NRDH), for cross-modal retrieval. The main idea behind NRDH is motivated by the success of neural networks, i.e., nonlinear descriptors, in the field of representation learning, and the use of nonlinear descriptors instead of simple linear transformations is more in line with the complex relationships that exist between common latent representation and heterogeneous multimedia data in the real world. In NRDH, we first learn a common latent representation through nonlinear descriptors to encode complementary and consistent information from the features of the heterogeneous multimedia data. Moreover, an asymmetric learning scheme is proposed to correlate the learned hash codes with the common latent representation. Empirically, we demonstrate that NRDH is able to successfully generate a comprehensive common latent representation that significantly improves the quality of the learned hash codes. Then, NRDH adopts a linear learning strategy to fast learn the hash function with the learned hash codes. Extensive experiments performed on two benchmark datasets highlight the superiority of NRDH over several state-of-the-art methods. |
2020 | Deep Variational and Structural Hashing | Venice Erin Liong, Jiwen Lu, Ling-Yu Duan, and Yap-Peng Tan | TPAMI | In this paper, we propose a deep variational and structural hashing (DVStH) method to learn compact binary codes for multimedia retrieval. Unlike most existing deep hashing methods which use a series of convolution and fully-connected layers to learn binary features, we develop a probabilistic framework to infer latent feature representation inside the network. Then, we design a struct layer rather than a bottleneck hash layer, to obtain binary codes through a simple encoding procedure. By doing these, we are able to obtain binary codes discriminatively and generatively. To make it applicable to cross-modal scalable multimedia retrieval, we extend our method to a cross-modal deep variational and structural hashing (CM-DVStH). We design a deep fusion network with a struct layer to maximize the correlation between image-text input pairs during the training stage so that a unified binary vector can be obtained. We then design modality-specific hashing networks to handle the out-of-sample extension scenario. Specifically, we train a network for each modality which outputs a latent representation that is as close as possible to the binary codes which are inferred from the fusion network. Experimental results on five benchmark datasets are presented to show the efficacy of the proposed approach. |
2020 | Fast Class-wise Updating For Online Hashing | Lin Mingbao, Ji Rongrong, Sun Xiaoshuai, Zhang Baochang, Huang Feiyue, Tian Yonghong, Tao Dacheng | Arxiv | Online image hashing has received increasing research attention recently which processes large-scale data in a streaming fashion to update the hash functions on-the-fly. To this end most existing works exploit this problem under a supervised setting i.e. using class labels to boost the hashing performance which suffers from the defects in both adaptivity and efficiency First large amounts of training batches are required to learn up-to-date hash functions which leads to poor online adaptivity. Second the training is time-consuming which contradicts with the core need of online learning. In this paper a novel supervised online hashing scheme termed Fast Class-wise Updating for Online Hashing (FCOH) is proposed to address the above two challenges by introducing a novel and efficient inner product operation. To achieve fast online adaptivity a class-wise updating method is developed to decompose the binary code learning and alternatively renew the hash functions in a class-wise fashion which well addresses the burden on large amounts of training batches. Quantitatively such a decomposition further leads to at least 7537; storage saving. To further achieve online efficiency we propose a semi-relaxation optimization which accelerates the online training by treating different binary constraints independently. Without additional constraints and variables the time complexity is significantly reduced. Such a scheme is also quantitatively shown to well preserve past information during updating hashing functions. We have quantitatively demonstrated that the collective effort of class-wise updating and semi-relaxation optimization provides a superior performance comparing to various state-of-the-art methods which is verified through extensive experiments on three widely-used datasets. |
2020 | Task-adaptive Asymmetric Deep Cross-modal Hashing | Li Fengling, Wang Tong, Zhu Lei, Zhang Zheng, Wang Xinhua | Arxiv | Supervised cross-modal hashing aims to embed the semantic correlations of heterogeneous modality data into the binary hash codes with discriminative semantic labels. Because of its advantages on retrieval and storage efficiency it is widely used for solving efficient cross-modal retrieval. However existing researches equally handle the different tasks of cross-modal retrieval and simply learn the same couple of hash functions in a symmetric way for them. Under such circumstance the uniqueness of different cross-modal retrieval tasks are ignored and sub-optimal performance may be brought. Motivated by this we present a Task-adaptive Asymmetric Deep Cross-modal Hashing (TA-ADCMH) method in this paper. It can learn task-adaptive hash functions for two sub-retrieval tasks via simultaneous modality representation and asymmetric hash learning. Unlike previous cross-modal hashing approaches our learning framework jointly optimizes semantic preserving that transforms deep features of multimedia data into binary hash codes and the semantic regression which directly regresses query modality representation to explicit label. With our model the binary codes can effectively preserve semantic correlations across different modalities meanwhile adaptively capture the query semantics. The superiority of TA-ADCMH is proved on two standard datasets from many aspects. |
2020 | Multiple Code Hashing For Efficient Image Retrieval | Li Ming-wei, Jiang Qing-yuan, Li Wu-jun | Arxiv | Due to its low storage cost and fast query speed hashing has been widely used in large-scale image retrieval tasks. Hash bucket search returns data points within a given Hamming radius to each query which can enable search at a constant or sub-linear time cost. However existing hashing methods cannot achieve satisfactory retrieval performance for hash bucket search in complex scenarios since they learn only one hash code for each image. More specifically by using one hash code to represent one image existing methods might fail to put similar image pairs to the buckets with a small Hamming distance to the query when the semantic information of images is complex. As a result a large number of hash buckets need to be visited for retrieving similar images based on the learned codes. This will deteriorate the efficiency of hash bucket search. In this paper we propose a novel hashing framework called multiple code hashing (MCH) to improve the performance of hash bucket search. The main idea of MCH is to learn multiple hash codes for each image with each code representing a different region of the image. Furthermore we propose a deep reinforcement learning algorithm to learn the parameters in MCH. To the best of our knowledge this is the first work that proposes to learn multiple hash codes for each image in image retrieval. Experiments demonstrate that MCH can achieve a significant improvement in hash bucket search compared with existing methods that learn only one hash code for each image. |
2020 | Deep Unsupervised Image Hashing By Maximizing Bit Entropy | Li Yunqiang, Van Gemert Jan | Arxiv | Unsupervised hashing is important for indexing huge image or video collections without having expensive annotations available. Hashing aims to learn short binary codes for compact storage and efficient semantic retrieval. We propose an unsupervised deep hashing layer called Bi-half Net that maximizes entropy of the binary codes. Entropy is maximal when both possible values of the bit are uniformly (half-half) distributed. To maximize bit entropy we do not add a term to the loss function as this is difficult to optimize and tune. Instead we design a new parameter-free network layer to explicitly force continuous image features to approximate the optimal half-half bit distribution. This layer is shown to minimize a penalized term of the Wasserstein distance between the learned continuous image features and the optimal half-half bit distribution. Experimental results on the image datasets Flickr25k Nus-wide Cifar-10 Mscoco Mnist and the video datasets Ucf-101 and Hmdb-51 show that our approach leads to compact codes and compares favorably to the current state-of-the-art. |
2020 | Locality-sensitive Hashing Scheme Based On Longest Circular Co-substring | Lei Yifan, Huang Qiang, Kankanhalli Mohan, Tung Anthony K. H. | Arxiv | Locality-Sensitive Hashing (LSH) is one of the most popular methods for c-Approximate Nearest Neighbor Search (c-ANNS) in high-dimensional spaces. In this paper we propose a novel LSH scheme based on the Longest Circular Co-Substring (LCCS) search framework (LCCS-LSH) with a theoretical guarantee. We introduce a novel concept of LCCS and a new data structure named Circular Shift Array (CSA) for k-LCCS search. The insight of LCCS search framework is that close data objects will have a longer LCCS than the far-apart ones with high probability. LCCS-LSH is (emph)LSH-family-independent and it supports c-ANNS with different kinds of distance metrics. We also introduce a multi-probe version of LCCS-LSH and conduct extensive experiments over five real-life datasets. The experimental results demonstrate that LCCS-LSH outperforms state-of-the-art LSH schemes. |
2020 | Hierarchical Deep Hashing for Fast Large Scale Image Retrieval | Yongfei Zhang, Cheng Peng, Zhang Jingtao, Xianglong Liu, Shiliang Pu, Changhuai Chen | Fast image retrieval is of great importance in many computer vision tasks and especially practical applications. Deep hashing, the state-of-the-art fast image retrieval scheme, introduces deep learning to learn the hash functions and generate binary hash codes, and outperforms the other image retrieval methods in terms of accuracy. However, all the existing deep hashing methods could only generate one level hash codes and require a linear traversal of all the hash codes to figure out the closest one when a new query arrives, which is very time-consuming and even intractable for large scale applications. In this work, we propose a Hierarchical Deep Hashing(HDHash) scheme to speed up the state-of-the-art deep hashing methods. More specifically, hierarchical deep hash codes of multiple levels can be generated and indexed with tree structures rather than linear ones, and pruning irrelevant branches can sharply decrease the retrieval time. To our best knowledge, this is the first work to introduce hierarchical indexed deep hashing for fast large scale image retrieval. Extensive experimental results on three benchmark datasets demonstrate that the proposed HDHash scheme achieves better or comparable accuracy with significantly improved efficiency and reduced memory as compared to state-of- the-art fast image retrieval schemes. |
|
2020 | A Survey On Deep Hashing For Image Retrieval | Zhang Xiaopeng | Arxiv | Hashing has been widely used in approximate nearest search for large-scale database retrieval for its computation and storage efficiency. Deep hashing which devises convolutional neural network architecture to exploit and extract the semantic information or feature of images has received increasing attention recently. In this survey several deep supervised hashing methods for image retrieval are evaluated and I conclude three main different directions for deep supervised hashing methods. Several comments are made at the end. Moreover to break through the bottleneck of the existing hashing methods I propose a Shadow Recurrent Hashing(SRH) method as a try. Specifically I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. To this end I propose a concept shadow of the CNN output. During optimization process the CNN output and its shadow are guiding each other so as to achieve the optimal solution as much as possible. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH. |
2020 | Fast Discrete Cross-Modal Hashing Based on Label Relaxation and Matrix Factorization | Donglin Zhang, Xiaojun Wu, Zhen Liu, Jun Yu, Josef Kittler | ICPR | In recent years, cross-media retrieval has drawn considerable attention due to the exponential growth of multimedia data. Many hashing approaches have been proposed for the cross-media search task. However, there are still open problems that warrant investigation. For example, most existing supervised hashing approaches employ a binary label matrix, which achieves small margins between wrong labels (0) and true labels (1). This may affect the retrieval performance by generating many false negatives and false positives. In addition, some methods adopt a relaxation scheme to solve the binary constraints, which may cause large quantization errors. There are also some discrete hashing methods that have been presented, but most of them are time-consuming. To conquer these problems, we present a label relaxation and discrete matrix factorization method (LRMF) for cross-modal retrieval. It offers a number of innovations. First of all, the proposed approach employs a novel label relaxation scheme to control the margins adaptively, which has the benefit of reducing the quantization error. Second, by virtue of the proposed discrete matrix factorization method designed to learn the binary codes, large quantization errors caused by relaxation can be avoided. The experimental results obtained on two widely-used databases demonstrate that LRMF outperforms state-of-the-art cross-media methods. |
2020 | SSAH: Semi-supervised Adversarial Deep Hashing with Self-paced Hard Sample Generation | Sheng Jin, Shangchen Zhou, Yao Liu, Chao Chen, Xiaoshuai Sun, Hongxun Yao, Xiansheng Hua | AAAI | Deep hashing methods have been proved to be effective and efficient for large-scale Web media search. The success of these data-driven methods largely depends on collecting sufficient labeled data, which is usually a crucial limitation in practical cases. The current solutions to this issue utilize Generative Adversarial Network (GAN) to augment data in semi-supervised learning. However, existing GAN-based methods treat image generations and hashing learning as two isolated processes, leading to generation ineffectiveness. Besides, most works fail to exploit the semantic information in unlabeled data. In this paper, we propose a novel Semi-supervised Self-pace Adversarial Hashing method, named SSAH to solve the above problems in a unified framework. The SSAH method consists of an adversarial network (A-Net) and a hashing network (H-Net). To improve the quality of generative images, first, the A-Net learns hard samples with multi-scale occlusions and multi-angle rotated deformations which compete against the learning of accurate hashing codes. Second, we design a novel self-paced hard generation policy to gradually increase the hashing difficulty of generated samples. To make use of the semantic information in unlabeled ones, we propose a semi-supervised consistent loss. The experimental results show that our method can significantly improve state-of-the-art models on both the widely-used hashing datasets and fine-grained datasets. |
2020 | Locality Sensitive Hashing For Set-queries Motivated By Group Recommendations | Kaplan Haim, Tenenbaum Jay | Arxiv | Locality Sensitive Hashing (LSH) is an effective method to index a set of points such that we can efficiently find the nearest neighbors of a query point. We extend this method to our novel Set-query LSH (SLSH) such that it can find the nearest neighbors of a set of points given as a query. Let s(xy) be the similarity between two points x and y . We define a similarity between a set Q and a point x by aggregating the similarities s(px) for all p(in) Q . For example we can take s(px) to be the angular similarity between p and x (i.e. 1-(angle) (xp)/(pi)) and aggregate by arithmetic or geometric averaging or taking the lowest similarity. We develop locality sensitive hash families and data structures for a large set of such arithmetic and geometric averaging similarities and analyze their collision probabilities. We also establish an analogous framework and hash families for distance functions. Specifically we give a structure for the euclidean distance aggregated by either averaging or taking the maximum. We leverage SLSH to solve a geometric extension of the approximate near neighbors problem. In this version we consider a metric for which the unit ball is an ellipsoid and its orientation is specified with the query. An important application that motivates our work is group recommendation systems. Such a system embeds movies and users in the same feature space and the task of recommending a movie for a group to watch together translates to a set-query Q using an appropriate similarity. |
2020 | Generalized Product Quantization Network For Semi-supervised Image Retrieval | Jang Young Kyun, Cho Nam Ik | Arxiv | Image retrieval methods that employ hashing or vector quantization have achieved great success by taking advantage of deep learning. However these approaches do not meet expectations unless expensive label information is sufficient. To resolve this issue we propose the first quantization-based semi-supervised image retrieval scheme Generalized Product Quantization (GPQ) network. We design a novel metric learning strategy that preserves semantic similarity between labeled data and employ entropy regularization term to fully exploit inherent potentials of unlabeled data. Our solution increases the generalization capacity of the quantization network which allows overcoming previous limitations in the retrieval community. Extensive experimental results demonstrate that GPQ yields state-of-the-art performance on large-scale real image benchmark datasets. |
2020 | Deep Reinforcement Learning With Label Embedding Reward For Supervised Image Hashing | Wang Zhenzhen, Hong Weixiang, Yuan Junsong | Arxiv | Deep hashing has shown promising results in image retrieval and recognition. Despite its success most existing deep hashing approaches are rather similar either multi-layer perceptron or CNN is applied to extract image feature followed by different binarization activation functions such as sigmoid tanh or autoencoder to generate binary code. In this work we introduce a novel decision-making approach for deep supervised hashing. We formulate the hashing problem as travelling across the vertices in the binary code space and learn a deep Q-network with a novel label embedding reward defined by Bose-Chaudhuri-Hocquenghem (BCH) codes to explore the best path. Extensive experiments and analysis on the CIFAR-10 and NUS-WIDE dataset show that our approach outperforms state-of-the-art supervised hashing methods under various code lengths. |
2020 | Online Collective Matrix Factorization Hashing for Large-Scale Cross-Media Retrieval | Di Wang, Quan Wang, Yaqiang An, Xinbo Gao, Yumin Tian | SIGIR | Cross-modal hashing has been widely investigated recently for its efficiency in large-scale cross-media retrieval. However, most existing cross-modal hashing methods learn hash functions in a batch-based learning mode. Such mode is not suitable for large-scale data sets due to the large memory consumption and loses its efficiency when training streaming data. Online cross-modal hashing can deal with the above problems by learning hash model in an online learning process. However, existing online cross-modal hashing methods cannot update hash codes of old data by the newly learned model. In this paper, we propose Online Collective Matrix Factorization Hashing (OCMFH) based on collective matrix factorization hashing (CMFH), which can adaptively update hash codes of old data according to dynamic changes of hash model without accessing to old data. Specifically, it learns discriminative hash codes for streaming data by collective matrix factorization in an online optimization scheme. Unlike conventional CMFH which needs to load the entire data points into memory, the proposed OCMFH retrains hash functions only by newly arriving data points. Meanwhile, it generates hash codes of new data and updates hash codes of old data by the latest updated hash model. In such way, hash codes of new data and old data are well-matched. Furthermore, a zero mean strategy is developed to solve the mean-varying problem in the online hash learning process. Extensive experiments on three benchmark data sets demonstrate the effectiveness and efficiency of OCMFH on online cross-media retrieval. |
2020 | Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing | Hengtong Hu, Lingxi Xie, Richang Hong, Qi Tian | CVPR | In recent years, cross-modal hashing (CMH) has attracted increasing attentions, mainly because its potential ability of mapping contents from different modalities, especially in vision and language, into the same space, so that it becomes efficient in cross-modal data retrieval. There are two main frameworks for CMH, differing from each other in whether semantic supervision is required. Compared to the unsupervised methods, the supervised methods often enjoy more accurate results, but require much heavier labors in data annotation. In this paper, we propose a novel approach that enables guiding a supervised method using outputs produced by an unsupervised method. Specifically, we make use of teacher-student optimization for propagating knowledge. Experiments are performed on two popular CMH benchmarks, i.e., the MIRFlickr and NUS-WIDE datasets. Our approach outperforms all existing unsupervised methods by a large margin |
2020 | Compact Deep Aggregation For Set Retrieval | Zhong Yujie, Arandjelović Relja, Zisserman Andrew | Arxiv | The objective of this work is to learn a compact embedding of a set of descriptors that is suitable for efficient retrieval and ranking whilst maintaining discriminability of the individual descriptors. We focus on a specific example of this general problem – that of retrieving images containing multiple faces from a large scale dataset of images. Here the set consists of the face descriptors in each image and given a query for multiple identities the goal is then to retrieve in order images which contain all the identities all but one (etc) To this end we make the following contributions first we propose a CNN architecture – (em) SetNet – to achieve the objective it learns face descriptors and their aggregation over a set to produce a compact fixed length descriptor designed for set retrieval and the score of an image is a count of the number of identities that match the query; second we show that this compact descriptor has minimal loss of discriminability up to two faces per image and degrades slowly after that – far exceeding a number of baselines; third we explore the speed vs. retrieval quality trade-off for set retrieval using this compact descriptor; and finally we collect and annotate a large dataset of images containing various number of celebrities which we use for evaluation and is publicly released. |
2020 | Directed Graph Hashing | Helbling Caleb | Arxiv | This paper presents several algorithms for hashing directed graphs. The algorithms given are capable of hashing entire graphs as well as assigning hash values to specific nodes in a given graph. The notion of node symmetry is made precise via computation of vertex orbits and the graph automorphism group and nodes that are symmetrically identical are assigned equal hashes. We also present a novel Merkle-style hashing algorithm that seeks to fulfill the recursive principle that a hash of a node should depend only on the hash of its neighbors. This algorithm works even in the presence of cycles which would not be possible with a naive approach. Structurally hashing trees has seen widespread use in blockchain source code version control and web applications. Despite the popularity of tree hashing directed graph hashing remains unstudied in the literature. Our algorithms open new possibilities to hashing both directed graphs and more complex data structures that can be reduced to directed graphs such as hypergraphs. |
2020 | A Non-alternating Graph Hashing Algorithm For Large Scale Image Search | Hemati Sobhan, Mehdizavareh Mohammad Hadi, Chenouri Shojaeddin, Tizhoosh Hamid R | Arxiv | In the era of big data methods for improving memory and computational efficiency have become crucial for successful deployment of technologies. Hashing is one of the most effective approaches to deal with computational limitations that come with big data. One natural way for formulating this problem is spectral hashing that directly incorporates affinity to learn binary codes. However due to binary constraints the optimization becomes intractable. To mitigate this challenge different relaxation approaches have been proposed to reduce the computational load of obtaining binary codes and still attain a good solution. The problem with all existing relaxation methods is resorting to one or more additional auxiliary variables to attain high quality binary codes while relaxing the problem. The existence of auxiliary variables leads to coordinate descent approach which increases the computational complexity. We argue that introducing these variables is unnecessary. To this end we propose a novel relaxed formulation for spectral hashing that adds no additional variables to the problem. Furthermore instead of solving the problem in original space where number of variables is equal to the data points we solve the problem in a much smaller space and retrieve the binary codes from this solution. This trick reduces both the memory and computational complexity at the same time. We apply two optimization techniques namely projected gradient and optimization on manifold to obtain the solution. Using comprehensive experiments on four public datasets we show that the proposed efficient spectral hashing (ESH) algorithm achieves highly competitive retrieval performance compared with state of the art at low complexity. |
2020 | Unsupervised Deep Cross-modality Spectral Hashing | Hoang Tuan, Do Thanh-toan, Nguyen Tam V., Cheung Ngai-man | Arxiv | This paper presents a novel framework namely Deep Cross-modality Spectral Hashing (DCSH) to tackle the unsupervised learning problem of binary hash codes for efficient cross-modal retrieval. The framework is a two-step hashing approach which decouples the optimization into (1) binary optimization and (2) hashing function learning. In the first step we propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. While the former is capable of well preserving the local structure of each modality the latter reveals the hidden patterns from all modalities. In the second step to learn mapping functions from informative data inputs (images and word embeddings) to binary codes obtained from the first step we leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality. Quantitative evaluations on three standard benchmark datasets demonstrate that the proposed DCSH method consistently outperforms other state-of-the-art methods. |
2020 | Unsupervised Semantic Hashing With Pairwise Reconstruction | Hansen Casper, Hansen Christian, Simonsen Jakob Grue, Alstrup Stephen, Lioma Christina | Arxiv | Semantic Hashing is a popular family of methods for efficient similarity search in large-scale datasets. In Semantic Hashing documents are encoded as short binary vectors (i.e. hash codes) such that semantic similarity can be efficiently computed using the Hamming distance. Recent state-of-the-art approaches have utilized weak supervision to train better performing hashing models. Inspired by this we present Semantic Hashing with Pairwise Reconstruction (PairRec) which is a discrete variational autoencoder based hashing model. PairRec first encodes weakly supervised training pairs (a query document and a semantically similar document) into two hash codes and then learns to reconstruct the same query document from both of these hash codes (i.e. pairwise reconstruction). This pairwise reconstruction enables our model to encode local neighbourhood structures within the hash code directly through the decoder. We experimentally compare PairRec to traditional and state-of-the-art approaches and obtain significant performance improvements in the task of document similarity search. |
2020 | Content-aware Neural Hashing For Cold-start Recommendation | Hansen Casper, Hansen Christian, Simonsen Jakob Grue, Alstrup Stephen, Lioma Christina | Arxiv | Content-aware recommendation approaches are essential for providing meaningful recommendations for (textitnew) (i.e. (textit)cold-start) items in a recommender system. We present a content-aware neural hashing-based collaborative filtering approach (NeuHash-CF) which generates binary hash codes for users and items such that the highly efficient Hamming distance can be used for estimating user-item relevance. NeuHash-CF is modelled as an autoencoder architecture consisting of two joint hashing components for generating user and item hash codes. Inspired from semantic hashing the item hashing component generates a hash code directly from an items content information (i.e. it generates cold-start and seen item hash codes in the same manner). This contrasts existing state-of-the-art models which treat the two item cases separately. The user hash codes are generated directly based on user id through learning a user embedding matrix. We show experimentally that NeuHash-CF significantly outperforms state-of-the-art baselines by up to 1237; NDCG and 1337; MRR in cold-start recommendation settings and up to 437; in both NDCG and MRR in standard settings where all items are present while training. Our approach uses 2-4x shorter hash codes while obtaining the same or better performance compared to the state of the art thus consequently also enabling a notable storage reduction. |
2020 | Deep Kernel Supervised Hashing For Node Classification In Structural Networks | Guo Jia-nan, Mao Xian-ling, Lin Shu-yang, Wei Wei, Huang Heyan | Arxiv | Node classification in structural networks has been proven to be useful in many real world applications. With the development of network embedding the performance of node classification has been greatly improved. However nearly all the existing network embedding based methods are hard to capture the actual category features of a node because of the linearly inseparable problem in low-dimensional space; meanwhile they cannot incorporate simultaneously network structure information and node label information into network embedding. To address the above problems in this paper we propose a novel Deep Kernel Supervised Hashing (DKSH) method to learn the hashing representations of nodes for node classification. Specifically a deep multiple kernel learning is first proposed to map nodes into suitable Hilbert space to deal with linearly inseparable problem. Then instead of only considering structural similarity between two nodes a novel similarity matrix is designed to merge both network structure information and node label information. Supervised by the similarity matrix the learned hashing representations of nodes simultaneously preserve the two kinds of information well from the learned Hilbert space. Extensive experiments show that the proposed method significantly outperforms the state-of-the-art baselines over three real world benchmark datasets. |
2020 | Image Retrieval For Structure-from-motion Via Graph Convolutional Network | Yan Shen, Pen Yang, Lai Shiming, Liu Yu, Zhang Maojun | Arxiv | Conventional image retrieval techniques for Structure-from-Motion (SfM) suffer from the limit of effectively recognizing repetitive patterns and cannot guarantee to create just enough match pairs with high precision and high recall. In this paper we present a novel retrieval method based on Graph Convolutional Network (GCN) to generate accurate pairwise matches without costly redundancy. We formulate image retrieval task as a node binary classification problem in graph data a node is marked as positive if it shares the scene overlaps with the query image. The key idea is that we find that the local context in feature space around a query image contains rich information about the matchable relation between this image and its neighbors. By constructing a subgraph surrounding the query image as input data we adopt a learnable GCN to exploit whether nodes in the subgraph have overlapping regions with the query photograph. Experiments demonstrate that our method performs remarkably well on the challenging dataset of highly ambiguous and duplicated scenes. Besides compared with state-of-the-art matchable retrieval methods the proposed approach significantly reduces useless attempted matches without sacrificing the accuracy and completeness of reconstruction. |
2020 | Deep Polarized Network for Supervised Learning of Accurate Binary Hashing Codes | Lixin Fan, Kam Woh Ng, Ce Ju, Tianyu Zhang, Chee Seng Chan | IJCAI | This paper proposes a novel deep polarized network (DPN) for learning to hash, in which each channel in the network outputs is pushed far away from zero by employing a differentiable bit-wise hinge-like loss which is dubbed as polarization loss. Reformulated within a generic Hamming Distance Metric Learning framework [Norouzi et al., 2012], the proposed polarization loss bypasses the requirement to prepare pairwise labels for (dis-)similar items and, yet, the proposed loss strictly bounds from above the pairwise Hamming Distance based losses. The intrinsic connection between pairwise and pointwise label information, as disclosed in this paper, brings about the following methodological improvements: (a) we may directly employ the proposed differentiable polarization loss with no large deviations incurred from the target Hamming distance based loss; and (b) the subtask of assigning binary codes becomes extremely simple — even random codes assigned to each class suffice to result in state-of-the-art performances, as demonstrated in CIFAR10, NUS-WIDE and ImageNet100 datasets. |
2020 | Central Similarity Quantization for Efficient Image and Video Retrieval | Li Yuan, Tao Wang, Xiaopeng Zhang, Francis EH Tay, Zequn Jie, Wei Liu, Jiashi Feng | CVPR | Existing data-dependent hashing methods usually learn hash functions from pairwise or triplet data relationships, which only capture the data similarity locally, and often suffer from low learning efficiency and low collision rate. In this work, we propose a new global similarity metric, termed as central similarity, with which the hash codes of similar data pairs are encouraged to approach a common center and those for dissimilar pairs to converge to different centers, to improve hash learning efficiency and retrieval accuracy. We principally formulate the computation of the proposed central similarity metric by introducing a new concept, i.e., hash center that refers to a set of data points scattered in the Hamming space with a sufficient mutual distance between each other. We then provide an efficient method to construct well separated hash centers by leveraging the Hadamard matrix and Bernoulli distributions. Finally, we propose the Central Similarity Quantization (CSQ) that optimizes the central similarity between data points w.r.t. their hash centers instead of optimizing the local similarity. CSQ is generic and applicable to both image and video hashing scenarios. Extensive experiments on large-scale image and video retrieval tasks demonstrate that CSQ can generate cohesive hash codes for similar data pairs and dispersed hash codes for dissimilar pairs, achieving a noticeable boost in retrieval performance, i.e. 3%-20% in mAP over the previous state-of-the-arts. |
2020 | Learning Space Partitions for Nearest Neighbor Search | Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner | ICLR | Space partitions of underlie a vast and important class of fast nearest neighbor search (NNS) algorithms. Inspired by recent theoretical work on NNS for general metric spaces (Andoni et al. 2018b,c), we develop a new framework for building space partitions reducing the problem to balanced graph partitioning followed by supervised classification. We instantiate this general approach with the KaHIP graph partitioner (Sanders and Schulz 2013) and neural networks, respectively, to obtain a new partitioning procedure called Neural Locality-Sensitive Hashing (Neural LSH). On several standard benchmarks for NNS (Aumuller et al. 2017), our experiments show that the partitions obtained by Neural LSH consistently outperform partitions found by quantization-based and tree-based methods as well as classic, data-oblivious LSH. |
2020 | Adversarial Collision Attacks On Image Hashing Functions | Dolhansky Brian, Ferrer Cristian Canton | Arxiv | Hashing images with a perceptual algorithm is a common approach to solving duplicate image detection problems. However perceptual image hashing algorithms are differentiable and are thus vulnerable to gradient-based adversarial attacks. We demonstrate that not only is it possible to modify an image to produce an unrelated hash but an exact image hash collision between a source and target image can be produced via minuscule adversarial perturbations. In a white box setting these collisions can be replicated across nearly every image pair and hash type (including both deep and non-learned hashes). Furthermore by attacking points other than the output of a hashing function an attacker can avoid having to know the details of a particular algorithm resulting in collisions that transfer across different hash sizes or model architectures. Using these techniques an adversary can poison the image lookup table of a duplicate image detection service resulting in undefined or unwanted behavior. Finally we offer several potential mitigations to gradient-based image hash attacks. |
2020 | Image Hashing By Minimizing Discrete Component-wise Wasserstein Distance | Doan Khoa D., Manchanda Saurav, Badirli Sarkhan, Reddy Chandan K. | Arxiv | Image hashing is one of the fundamental problems that demand both efficient and effective solutions for various practical scenarios. Adversarial autoencoders are shown to be able to implicitly learn a robust locality-preserving hash function that generates balanced and high-quality hash codes. However the existing adversarial hashing methods are inefficient to be employed for large-scale image retrieval applications. Specifically they require an exponential number of samples to be able to generate optimal hash codes and a significantly high computational cost to train. In this paper we show that the high sample-complexity requirement often results in sub-optimal retrieval performance of the adversarial hashing methods. To address this challenge we propose a new adversarial-autoencoder hashing approach that has a much lower sample requirement and computational cost. Specifically by exploiting the desired properties of the hash function in the low-dimensional discrete space our method efficiently estimates a better variant of Wasserstein distance by averaging a set of easy-to-compute one-dimensional Wasserstein distances. The resulting hashing approach has an order-of-magnitude better sample complexity thus better generalization property compared to the other adversarial hashing methods. In addition the computational cost is significantly reduced using our approach. We conduct experiments on several real-world datasets and show that the proposed method outperforms the competing hashing methods achieving up to 1037; improvement over the current state-of-the-art image hashing methods. The code accompanying this paper is available on Github (https://github.com/khoadoan/adversarial-hashing).” |
2020 | A Genetic Algorithm For Obtaining Memory Constrained Near-perfect Hashing | Domnita Dan, Oprisa Ciprian | Arxiv | The problem of fast items retrieval from a fixed collection is often encountered in most computer science areas from operating system components to databases and user interfaces. We present an approach based on hash tables that focuses on both minimizing the number of comparisons performed during the search and minimizing the total collection size. The standard open-addressing double-hashing approach is improved with a non-linear transformation that can be parametrized in order to ensure a uniform distribution of the data in the hash table. The optimal parameter is determined using a genetic algorithm. The paper results show that near-perfect hashing is faster than binary search yet uses less memory than perfect hashing being a good choice for memory-constrained applications where search time is also critical. |
2020 | Two-Stream Deep Hashing With Class-Specific Centers for Supervised Image Search | Cheng Deng, Erkun Yang, Tongliang Liu, Dacheng Tao | TNNLS | Hashing has been widely used for large-scale approximate nearest neighbor search due to its storage and search efficiency. Recent supervised hashing research has shown that deep learning-based methods can significantly outperform nondeep methods. Most existing supervised deep hashing methods exploit supervisory signals to generate similar and dissimilar image pairs for training. However, natural images can have large intraclass and small interclass variations, which may degrade the accuracy of hash codes. To address this problem, we propose a novel two-stream ConvNet architecture, which learns hash codes with class-specific representation centers. Our basic idea is that if we can learn a unified binary representation for each class as a center and encourage hash codes of images to be close to the corresponding centers, the intraclass variation will be greatly reduced. Accordingly, we design a neural network that leverages label information and outputs a unified binary representation for each class. Moreover, we also design an image network to learn hash codes from images and force these hash codes to be close to the corresponding class-specific centers. These two neural networks are then seamlessly incorporated to create a unified, end-to-end trainable framework. Extensive experiments on three popular benchmarks corroborate that our proposed method outperforms current state-of-the-art methods. |
2020 | Camera-based Piano Sheet Music Identification | Yang Daniel, Tsai Tj | Arxiv | This paper presents a method for large-scale retrieval of piano sheet music images. Our work differs from previous studies on sheet music retrieval in two ways. First we investigate the problem at a much larger scale than previous studies using all solo piano sheet music images in the entire IMSLP dataset as a searchable database. Second we use cell phone images of sheet music as our input queries which lends itself to a practical user-facing application. We show that a previously proposed fingerprinting method for sheet music retrieval is far too slow for a real-time application and we diagnose its shortcomings. We propose a novel hashing scheme called dynamic n-gram fingerprinting that significantly reduces runtime while simultaneously boosting retrieval accuracy. In experiments on IMSLP data our proposed method achieves a mean reciprocal rank of 0.85 and an average runtime of 0.98 seconds per query. |
2020 | Central Similarity Hashing for Efficient Image and Video Retrieval | Li Yuan, Tao Wang, Xiaopeng Zhang, Zequn Jie, Francis EH Tay, Jiashi Feng | CVPR | Existing data-dependent hashing methods usually learn hash functions from the pairwise or triplet data relationships, which only capture the data similarity locally, and often suffer low learning efficiency and low collision rate. In this work, we propose a new global similarity metric, termed as central similarity, with which the hash codes for similar data pairs are encouraged to approach a common center and those for dissimilar pairs to converge to different centers, to improve hash learning efficiency and retrieval accuracy. We principally formulate the computation of the proposed central similarity metric by introducing a new concept, i.e. hash center that refers to a set of data points scattered in the Hamming space with sufficient mutual distance between each other. We then provide an efficient method to construct well separated hash centers by leveraging the Hadamard matrix and Bernoulli distributions. Finally, we propose the Central Similarity Hashing (CSH) that optimizes the central similarity between data points w.r.t. their hash centers instead of optimizing the local similarity. The CSH is generic and applicable to both image and video hashing. Extensive experiments on large-scale image and video retrieval demonstrate CSH can generate cohesive hash codes for similar data pairs and dispersed hash codes for dissimilar pairs, and achieve noticeable boost in retrieval performance, i.e. 3%-20% in mAP over the previous state-of-the-art. The codes are in: https://github.com/yuanli2333/ Hadamard-Matrix-for-hashing |
2020 | Pairwise Supervised Hashing With Bernoulli Variational Auto-encoder And Self-control Gradient Estimator | Dadaneh Siamak Zamani, Boluki Shahin, Yin Mingzhang, Zhou Mingyuan, Qian Xiaoning | Uncertainty in Artificial Intelligence Conference | Semantic hashing has become a crucial component of fast similarity search in many large-scale information retrieval systems in particular for text data. Variational auto-encoders (VAEs) with binary latent variables as hashing codes provide state-of-the-art performance in terms of precision for document retrieval. We propose a pairwise loss function with discrete latent VAE to reward within-class similarity and between-class dissimilarity for supervised hashing. Instead of solving the optimization relying on existing biased gradient estimators an unbiased low-variance gradient estimator is adopted to optimize the hashing function by evaluating the non-differentiable loss function over two correlated sets of binary hashing codes to control the variance of gradient estimates. This new semantic hashing framework achieves superior performance compared to the state-of-the-arts as demonstrated by our comprehensive experiments. |
2020 | Exchnet A Unified Hashing Network For Large-scale Fine-grained Image Retrieval | Cui Quan, Jiang Qing-yuan, Wei Xiu-shen, Li Wu-jun, Yoshie Osamu | Arxiv | Retrieving content relevant images from a large-scale fine-grained dataset could suffer from intolerably slow query speed and highly redundant storage cost due to high-dimensional real-valued embeddings which aim to distinguish subtle visual differences of fine-grained objects. In this paper we study the novel fine-grained hashing topic to generate compact binary codes for fine-grained images leveraging the search and storage efficiency of hash learning to alleviate the aforementioned problems. Specifically we propose a unified end-to-end trainable network termed as ExchNet. Based on attention mechanisms and proposed attention constraints it can firstly obtain both local and global features to represent object parts and whole fine-grained objects respectively. Furthermore to ensure the discriminative ability and semantic meanings consistency of these part-level features across images we design a local feature alignment approach by performing a feature exchanging operation. Later an alternative learning algorithm is employed to optimize the whole ExchNet and then generate the final binary hash codes. Validated by extensive experiments our proposal consistently outperforms state-of-the-art generic hashing methods on five fine-grained datasets which shows our effectiveness. Moreover compared with other approximate nearest neighbor methods ExchNet achieves the best speed-up and storage reduction revealing its efficiency and practicality. |
2020 | Dartminhash Fast Sketching For Weighted Sets | Christiani Tobias | Arxiv | Weighted minwise hashing is a standard dimensionality reduction technique with applications to similarity search and large-scale kernel machines. We introduce a simple algorithm that takes a weighted set x (in) (mathbbR)_(geq) 0^d and computes k independent minhashes in expected time O(k (log) k + (Vert) x (Vert)_0(log)( (Vert) x (Vert)_1 + 1/(Vert) x (Vert)_1)) improving upon the state-of-the-art BagMinHash algorithm (KDD 18) and representing the fastest weighted minhash algorithm for sparse data. Our experiments show running times that scale better with k and (Vert) x (Vert)_0 compared to ICWS (ICDM 10) and BagMinhash obtaining 10x speedups in common use cases. Our approach also gives rise to a technique for computing fully independent locality-sensitive hash values for (L K)-parameterized approximate near neighbor search under weighted Jaccard similarity in optimal expected time O(LK + (Vert) x (Vert)_0) improving on prior work even in the case of unweighted sets. |
2020 | Strongly Constrained Discrete Hashing | Yong Chen, Zhibao Tian, Hui Zhang, Jun Wang, Dell Zhang | TIP | Learning to hash is a fundamental technique widely used in large-scale image retrieval. Most existing methods for learning to hash address the involved discrete optimization problem by the continuous relaxation of the binary constraint, which usually leads to large quantization errors and consequently suboptimal binary codes. A few discrete hashing methods have emerged recently. However, they either completely ignore some useful constraints (specifically the balance and decorrelation of hash bits) or just turn those constraints into regularizers that would make the optimization easier but less accurate. In this paper, we propose a novel supervised hashing method named Strongly Constrained Discrete Hashing (SCDH) which overcomes such limitations. It can learn the binary codes for all examples in the training set, and meanwhile obtain a hash function for unseen samples with the above mentioned constraints preserved. Although the model of SCDH is fairly sophisticated, we are able to find closed-form solutions to all of its optimization subproblems and thus design an efficient algorithm that converges quickly. In addition, we extend SCDH to a kernelized version SCDH K . Our experiments on three large benchmark datasets have demonstrated that not only can SCDH and SCDH K achieve substantially higher MAP scores than state-of-the-art baselines, but they train much faster than those that are also supervised as well. |
2020 | Making Online Sketching Hashing Even Faster | Chen Xixian, Yang Haiqin, Zhao Shenglin, Lyu Michael R., King Irwin | IEEE Transactions on Knowledge and Data Engineering | Data-dependent hashing methods have demonstrated good performance in various machine learning applications to learn a low-dimensional representation from the original data. However they still suffer from several obstacles First most of existing hashing methods are trained in a batch mode yielding inefficiency for training streaming data. Second the computational cost and the memory consumption increase extraordinarily in the big data setting which perplexes the training procedure. Third the lack of labeled data hinders the improvement of the model performance. To address these difficulties we utilize online sketching hashing (OSH) and present a FasteR Online Sketching Hashing (FROSH) algorithm to sketch the data in a more compact form via an independent transformation. We provide theoretical justification to guarantee that our proposed FROSH consumes less time and achieves a comparable sketching precision under the same memory cost of OSH. We also extend FROSH to its distributed implementation namely DFROSH to further reduce the training time cost of FROSH while deriving the theoretical bound of the sketching precision. Finally we conduct extensive experiments on both synthetic and real datasets to demonstrate the attractive merits of FROSH and DFROSH. |
2020 | Online Hashing with Efficient Updating of Binary Codes | Zhenyu Weng, Yuesheng Zhu | AAAI | Online hashing methods are efficient in learning the hash functions from the streaming data. However, when the hash functions change, the binary codes for the database have to be recomputed to guarantee the retrieval accuracy. Recomputing the binary codes by accumulating the whole database brings a timeliness challenge to the online retrieval process. In this paper, we propose a novel online hashing framework to update the binary codes efficiently without accumulating the whole database. In our framework, the hash functions are fixed and the projection functions are introduced to learn online from the streaming data. Therefore, inefficient updating of the binary codes by accumulating the whole database can be transformed to efficient updating of the binary codes by projecting the binary codes into another binary space. The queries and the binary code database are projected asymmetrically to further improve the retrieval accuracy. The experiments on two multi-label image databases demonstrate the effectiveness and the efficiency of our method for multi-label image retrieval. |
2020 | Efficient Image Retrieval Using Multi Neural Hash Codes And Bloom Filters | Chakrabarti Sourin | Arxiv | This paper aims to deliver an efficient and modified approach for image retrieval using multiple neural hash codes and limiting the number of queries using bloom filters by identifying false positives beforehand. Traditional approaches involving neural networks for image retrieval tasks tend to use higher layers for feature extraction. But it has been seen that the activations of lower layers have proven to be more effective in a number of scenarios. In our approach we have leveraged the use of local deep convolutional neural networks which combines the powers of both the features of lower and higher layers for creating feature maps which are then compressed using PCA and fed to a bloom filter after binary sequencing using a modified multi k-means approach. The feature maps obtained are further used in the image retrieval process in a hierarchical coarse-to-fine manner by first comparing the images in the higher layers for semantically similar images and then gradually moving towards the lower layers searching for structural similarities. While searching the neural hashes for the query image are again calculated and queried in the bloom filter which tells us whether the query image is absent in the set or maybe present. If the bloom filter doesnt necessarily rule out the query then it goes into the image retrieval process. This approach can be particularly helpful in cases where the image store is distributed since the approach supports parallel querying. |
2020 | Enhanced Discrete Multi-modal Hashing: More Constraints yet Less Time to Learn | Yong Chen, Hui Zhang, Zhibao Tian, Jun Wang, Dell Zhang, Xuelong Li | TKDE | Due to the exponential growth of multimedia data, multi-modal hashing as a promising technique to make cross-view retrieval scalable is attracting more and more attention. However, most of the existing multi-modal hashing methods either divide the learning process unnaturally into two separate stages or treat the discrete optimization problem simplistically as a continuous one, which leads to suboptimal results. Recently, a few discrete multi-modal hashing methods that try to address such issues have emerged, but they still ignore several important discrete constraints (such as the balance and decorrelation of hash bits). In this paper, we overcome those limitations by proposing a novel method named “Enhanced Discrete Multi-modal Hashing (EDMH)” which learns binary codes and hashing functions simultaneously from the pairwise similarity matrix of data, under the aforementioned discrete constraints. Although the model of EDMH looks a lot more complex than the other models for multi-modal hashing, we are actually able to develop a fast iterative learning algorithm for it, since the subproblems of its optimization all have closed-form solutions after introducing two auxiliary variables. Our experimental results on three real-world datasets have demonstrated that EDMH not only performs much better than state-of-the-art competitors but also runs much faster than them. |
2020 | Learning to Hash with a Dimension Analysis-based Quantizer for Image Retrieval | Yuan Cao; Heng Qi; Jie Gui; Keqiu Li; Yuan Yan Tang; James Tin-Yau Kwok | TOM | The last few years have witnessed the rise of the big data era in which approximate nearest neighbor search is a fundamental problem in many applications, such as large-scale image retrieval. Recently, many research results have demonstrated that hashing can achieve promising performance due to its appealing storage and search efficiency. Since complex optimization problems for loss functions are difficult to solve, most hashing methods decompose the hash code learning problem into two steps: projection and quantization. In the quantization step, binary codes are widely used because ranking them by the Hamming distance is very efficient. However, the massive information loss produced by the quantization step should be reduced in applications where high search accuracy is required, such as in image retrieval. Since many two-step hashing methods produce uneven projected dimensions in the projection step, in this paper, we propose a novel dimension analysis-based quantization (DAQ) on two-step hashing methods for image retrieval. We first perform an importance analysis of the projected dimensions and select a subset of them that are more informative than others, and then we divide the selected projected dimensions into several regions with our quantizer. Every region is quantized with its corresponding codebook. Finally, the similarity between two hash codes is estimated by the Manhattan distance between their corresponding codebooks, which is also efficient. We conduct experiments on three public benchmarks containing up to one million descriptors and show that the proposed DAQ method consistently leads to significant accuracy improvements over state-of-the-art quantization methods. |
2020 | Random VLAD Based Deep Hashing For Efficient Image Retrieval | Weng Li, Ye Lingzhi, Tian Jiangmin, Cao Jiuwen, Wang Jianzhong | Arxiv | Image hash algorithms generate compact binary representations that can be quickly matched by Hamming distance thus become an efficient solution for large-scale image retrieval. This paper proposes RV-SSDH a deep image hash algorithm that incorporates the classical VLAD (vector of locally aggregated descriptors) architecture into neural networks. Specifically a novel neural network component is formed by coupling a random VLAD layer with a latent hash layer through a transform layer. This component can be combined with convolutional layers to realize a hash algorithm. We implement RV-SSDH as a point-wise algorithm that can be efficiently trained by minimizing classification error and quantization loss. Comprehensive experiments show this new architecture significantly outperforms baselines such as NetVLAD and SSDH and offers a cost-effective trade-off in the state-of-the-art. In addition the proposed random VLAD layer leads to satisfactory accuracy with low complexity thus shows promising potentials as an alternative to NetVLAD. |
2020 | Targeted Attack for Deep Hashing based Retrieval | Jiawang Bai, Bin Chen, Yiming Li, Dongxian Wu, Weiwei Guo, Shu-tao Xia, En-hui Yang | Arxiv | The deep hashing based retrieval method is widely adopted in large-scale image and video retrieval. However, there is little investigation on its security. In this paper, we propose a novel method, dubbed deep hashing targeted attack (DHTA), to study the targeted attack on such retrieval. Specifically, we first formulate the targeted attack as a point-to-set optimization, which minimizes the average distance between the hash code of an adversarial example and those of a set of objects with the target label. Then we design a novel component-voting scheme to obtain an anchor code as the representative of the set of hash codes of objects with the target label, whose optimality guarantee is also theoretically derived. To balance the performance and perceptibility, we propose to minimize the Hamming distance between the hash code of the adversarial example and the anchor code under the ℓ∞ restriction on the perturbation. Extensive experiments verify that DHTA is effective in attacking both deep hashing based image retrieval and video retrieval. |
2020 | Learning To Hash With Semantic Similarity Metrics And Empirical KL Divergence | Arponen Heikki, Bishop Tom E. | Arxiv | Learning to hash is an efficient paradigm for exact and approximate nearest neighbor search from massive databases. Binary hash codes are typically extracted from an image by rounding output features from a CNN which is trained on a supervised binary similar/ dissimilar task. Drawbacks of this approach are (i) resulting codes do not necessarily capture semantic similarity of the input data (ii) rounding results in information loss manifesting as decreased retrieval performance and (iii) Using only class-wise similarity as a target can lead to trivial solutions simply encoding classifier outputs rather than learning more intricate relations which is not detected by most performance metrics. We overcome (i) via a novel loss function encouraging the relative hash code distances of learned features to match those derived from their targets. We address (ii) via a differentiable estimate of the KL divergence between network outputs and a binary target distribution resulting in minimal information loss when the features are rounded to binary. Finally we resolve (iii) by focusing on a hierarchical precision metric. Efficiency of the methods is demonstrated with semantic image retrieval on the CIFAR-100 ImageNet and Conceptual Captions datasets using similarities inferred from the WordNet label hierarchy or sentence embeddings. |
2020 | Targeted Attack For Deep Hashing Based Retrieval | Bai Jiawang, Chen Bin, Li Yiming, Wu Dongxian, Guo Weiwei, Xia Shu-tao, Yang En-hui | Arxiv | The deep hashing based retrieval method is widely adopted in large-scale image and video retrieval. However there is little investigation on its security. In this paper we propose a novel method dubbed deep hashing targeted attack (DHTA) to study the targeted attack on such retrieval. Specifically we first formulate the targeted attack as a point-to-set optimization which minimizes the average distance between the hash code of an adversarial example and those of a set of objects with the target label. Then we design a novel component-voting scheme to obtain an anchor code as the representative of the set of hash codes of objects with the target label whose optimality guarantee is also theoretically derived. To balance the performance and perceptibility we propose to minimize the Hamming distance between the hash code of the adversarial example and the anchor code under the (ell)^(infty) restriction on the perturbation. Extensive experiments verify that DHTA is effective in attacking both deep hashing based image retrieval and video retrieval. |
2020 | Fast Search On Binary Codes By Weighted Hamming Distance | Weng Zhenyu, Zhu Yuesheng, Liu Ruixin | Arxiv | Weighted Hamming distance as a similarity measure between binary codes and binary queries provides superior accuracy in search tasks than Hamming distance. However how to efficiently and accurately find K binary codes that have the smallest weighted Hamming distance to the query remains an open issue. In this paper a fast search algorithm is proposed to perform the non-exhaustive search for K nearest binary codes by weighted Hamming distance. By using binary codes as direct bucket indices in a hash table the search algorithm generates a sequence to probe the buckets based on the independence characteristic of the weights for each bit. Furthermore a fast search framework based on the proposed search algorithm is designed to solve the problem of long binary codes. Specifically long binary codes are split into substrings and multiple hash tables are built on them. Then the search algorithm probes the buckets to obtain candidates according to the generated substring indices and a merging algorithm is proposed to find the nearest binary codes by merging the candidates. Theoretical analysis and experimental results demonstrate that the search algorithm improves the search accuracy compared to other non-exhaustive algorithms and provides orders-of-magnitude faster search than the linear scan baseline. |
2020 | Auto-Encoding Twin-Bottleneck Hashing | Yuming Shen, Jie Qin, Jiaxin Chen, Mengyang Yu, Li Liu, Fan Zhu, Fumin Shen, Ling Shao | CVPR | Conventional unsupervised hashing methods usually take advantage of similarity graphs, which are either pre-computed in the high-dimensional space or obtained from random anchor points. On the one hand, existing methods uncouple the procedures of hash function learning and graph construction. On the other hand, graphs empirically built upon original data could introduce biased prior knowledge of data relevance, leading to sub-optimal retrieval performance. In this paper, we tackle the above problems by proposing an efficient and adaptive code-driven graph, which is updated by decoding in the context of an auto-encoder. Specifically, we introduce into our framework twin bottlenecks (i.e., latent variables) that exchange crucial information collaboratively. One bottleneck (i.e., binary codes) conveys the high-level intrinsic data structure captured by the code-driven graph to the other (i.e., continuous variables for low-level detail information), which in turn propagates the updated network feedback for the encoder to learn more discriminative binary codes. The auto-encoding learning objective literally rewards the code-driven graph to learn an optimal encoder. Moreover, the proposed model can be simply optimized by gradient descent without violating the binary constraints. Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods. |
2020 | Locality-sensitive Hashing In Function Spaces | Shand Will, Becker Stephen | Arxiv | We discuss the problem of performing similarity search over function spaces. To perform search over such spaces in a reasonable amount of time we use (it) locality-sensitive hashing (LSH). We present two methods that allow LSH functions on (mathbbR)^N to be extended to L^p spaces one using function approximation in an orthonormal basis and another using (quasi-)Monte Carlo-style techniques. We use the presented hashing schemes to construct an LSH family for Wasserstein distance over one-dimensional continuous probability distributions. |
2020 | Bio-inspired Hashing For Unsupervised Similarity Search | Ryali Chaitanya K., Hopfield John J., Grinberg Leopold, Krotov Dmitry | Proceedings of the International Conference on Machine Learning | The fruit fly Drosophilas olfactory circuit has inspired a new locality sensitive hashing (LSH) algorithm FlyHash. In contrast with classical LSH algorithms that produce low dimensional hash codes FlyHash produces sparse high-dimensional hash codes and has also been shown to have superior empirical performance compared to classical LSH algorithms in similarity search. However FlyHash uses random projections and cannot learn from data. Building on inspiration from FlyHash and the ubiquity of sparse expansive representations in neurobiology our work proposes a novel hashing algorithm BioHash that produces sparse high dimensional hash codes in a data-driven manner. We show that BioHash outperforms previously published benchmarks for various hashing methods. Since our learning algorithm is based on a local and biologically plausible synaptic plasticity rule our work provides evidence for the proposal that LSH might be a computational reason for the abundance of sparse expansive motifs in a variety of biological systems. We also propose a convolutional variant BioConvHash that further improves performance. From the perspective of computer science BioHash and BioConvHash are fast scalable and yield compressed binary representations that are useful for similarity search. |
2020 | HM-ANN Efficient Billion-point Nearest Neighbor Search On Heterogeneous Memory | Jie Ren, Minjia Zhang, Dong Li | Neural Information Processing Systems | The state-of-the-art approximate nearest neighbor search (ANNS) algorithms face a fundamental tradeoff between query latency and accuracy because of small main memory capacity To store indices in main memory for short query latency the ANNS algorithms have to limit dataset size or use a quantization scheme which hurts search accuracy. The emergence of heterogeneous memory (HM) brings a solution to significantly increase memory capacity and break the above tradeoff Using HM billions of data points can be placed in the main memory on a single machine without using any data compression. However HM consists of both fast (but small) memory and slow (but large) memory and using HM inappropriately slows down query significantly. In this work we present a novel graph-based similarity search algorithm called HM-ANN which takes both memory and data heterogeneity into consideration and enables billion-scale similarity search on a single node without using compression. On two billion-sized datasets BIGANN and DEEP1B HM-ANN outperforms state-of-the-art compression-based solutions such as Lamp;C and IMI+OPQ in recall-vs-latency by a large margin obtaining 4637; higher recall under the same search latency. We also extend existing graph-based methods such as HNSW and NSG with two strong baseline implementations on HM. At billion-point scale HM-ANN is 2X and 5.8X faster than our HNSWand NSG baselines respectively to reach the same accuracy. |
2020 | Generative Semantic Hashing Enhanced Via Boltzmann Machines | Zheng Lin, Su Qinliang, Shen Dinghan, Chen Changyou | Arxiv | Generative semantic hashing is a promising technique for large-scale information retrieval thanks to its fast retrieval speed and small memory footprint. For the tractability of training existing generative-hashing methods mostly assume a factorized form for the posterior distribution enforcing independence among the bits of hash codes. From the perspectives of both model representation and code space size independence is always not the best assumption. In this paper to introduce correlations among the bits of hash codes we propose to employ the distribution of Boltzmann machine as the variational posterior. To address the intractability issue of training we first develop an approximate method to reparameterize the distribution of a Boltzmann machine by augmenting it as a hierarchical concatenation of a Gaussian-like distribution and a Bernoulli distribution. Based on that an asymptotically-exact lower bound is further derived for the evidence lower bound (ELBO). With these novel techniques the entire model can be optimized efficiently. Extensive experimental results demonstrate that by effectively modeling correlations among different bits within a hash code our model can achieve significant performance gains. |
2020 | Minimizing Flops To Learn Efficient Sparse Representations | Paria Biswajit, Yeh Chih-kuan, Yen Ian E. H., Xu Ning, Ravikumar Pradeep, Póczos Barnabás | Arxiv | Deep representation learning has become one of the most widely adopted approaches for visual search recommendation and identification. Retrieval of such representations from a large database is however computationally challenging. Approximate methods based on learning compact representations have been widely explored for this problem such as locality sensitive hashing product quantization and PCA. In this work in contrast to learning compact representations we propose to learn high dimensional and sparse representations that have similar representational capacity as dense embeddings while being more efficient due to sparse matrix multiplication operations which can be much faster than dense multiplication. Following the key insight that the number of operations decreases quadratically with the sparsity of embeddings provided the non-zero entries are distributed uniformly across dimensions we propose a novel approach to learn such distributed sparse embeddings via the use of a carefully constructed regularization function that directly minimizes a continuous relaxation of the number of floating-point operations (FLOPs) incurred during retrieval. Our experiments show that our approach is competitive to the other baselines and yields a similar or better speed-vs-accuracy tradeoff on practical datasets. |
2020 | Procrustean Orthogonal Sparse Hashing | Tepper Mariano, Sengupta Dipanjan, Willke Ted | Arxiv | Hashing is one of the most popular methods for similarity search because of its speed and efficiency. Dense binary hashing is prevalent in the literature. Recently insect olfaction was shown to be structurally and functionally analogous to sparse hashing 6. Here we prove that this biological mechanism is the solution to a well-posed optimization problem. Furthermore we show that orthogonality increases the accuracy of sparse hashing. Next we present a novel method Procrustean Orthogonal Sparse Hashing (POSH) that unifies these findings learning an orthogonal transform from training data compatible with the sparse hashing mechanism. We provide theoretical evidence of the shortcomings of Optimal Sparse Lifting (OSL) 22 and BioHash 30 two related olfaction-inspired methods and propose two new methods Binary OSL and SphericalHash to address these deficiencies. We compare POSH Binary OSL and SphericalHash to several state-of-the-art hashing methods and provide empirical results for the superiority of the proposed methods across a wide range of standard benchmarks and parameter settings. |
2020 | Learning To Hash With Graph Neural Networks For Recommender Systems | Tan Qiaoyu, Liu Ninghao, Zhao Xing, Yang Hongxia, Zhou Jingren, Hu Xia | Arxiv | Graph representation learning has attracted much attention in supporting high quality candidate search at scale. Despite its effectiveness in learning embedding vectors for objects in the user-item interaction network the computational costs to infer users preferences in continuous embedding space are tremendous. In this work we investigate the problem of hashing with graph neural networks (GNNs) for high quality retrieval and propose a simple yet effective discrete representation learning framework to jointly learn continuous and discrete codes. Specifically a deep hashing with GNNs (HashGNN) is presented which consists of two components a GNN encoder for learning node representations and a hash layer for encoding representations to hash codes. The whole architecture is trained end-to-end by jointly optimizing two losses i.e. reconstruction loss from reconstructing observed links and ranking loss from preserving the relative ordering of hash codes. A novel discrete optimization strategy based on straight through estimator (STE) with guidance is proposed. The principal idea is to avoid gradient magnification in back-propagation of STE with continuous embedding guidance in which we begin from learning an easier network that mimic the continuous embedding and let it evolve during the training until it finally goes back to STE. Comprehensive experiments over three publicly available and one real-world Alibaba company datasets demonstrate that our model not only can achieve comparable performance compared with its continuous counterpart but also runs multiple times faster during inference. |
2020 | Error-corrected Margin-based Deep Cross-modal Hashing For Facial Image Retrieval | Taherkhani Fariborz, Talreja Veeru, Valenti Matthew C., Nasrabadi Nasser M. | Arxiv | Cross-modal hashing facilitates mapping of heterogeneous multimedia data into a common Hamming space which can beutilized for fast and flexible retrieval across different modalities. In this paper we propose a novel cross-modal hashingarchitecture-deep neural decoder cross-modal hashing (DNDCMH) which uses a binary vector specifying the presence of certainfacial attributes as an input query to retrieve relevant face images from a database. The DNDCMH network consists of two separatecomponents an attribute-based deep cross-modal hashing (ADCMH) module which uses a margin (m)-based loss function toefficiently learn compact binary codes to preserve similarity between modalities in the Hamming space and a neural error correctingdecoder (NECD) which is an error correcting decoder implemented with a neural network. The goal of NECD network in DNDCMH isto error correct the hash codes generated by ADCMH to improve the retrieval efficiency. The NECD network is trained such that it hasan error correcting capability greater than or equal to the margin (m) of the margin-based loss function. This results in NECD cancorrect the corrupted hash codes generated by ADCMH up to the Hamming distance of m. We have evaluated and comparedDNDCMH with state-of-the-art cross-modal hashing methods on standard datasets to demonstrate the superiority of our method. |
2020 | Deep Learning For Image Search And Retrieval In Large Remote Sensing Archives | Sumbul Gencer, Kang Jian, Demir Begüm | Arxiv | This chapter presents recent advances in content based image search and retrieval (CBIR) systems in remote sensing (RS) for fast and accurate information discovery from massive data archives. Initially we analyze the limitations of the traditional CBIR systems that rely on the hand-crafted RS image descriptors. Then we focus our attention on the advances in RS CBIR systems for which deep learning (DL) models are at the forefront. In particular we present the theoretical properties of the most recent DL based CBIR systems for the characterization of the complex semantic content of RS images. After discussing their strengths and limitations we present the deep hashing based CBIR systems that have high time-efficient search capability within huge data archives. Finally the most promising research directions in RS CBIR are discussed. |
2020 | Unsupervised Few-Bits Semantic Hashing with Implicit Topics Modeling | Fanghua Ye, Jarana Manotumruksa, Emine Yilmaz | EMNLP | Semantic hashing is a powerful paradigm for representing texts as compact binary hash codes. The explosion of short text data has spurred the demand of few-bits hashing. However, the performance of existing semantic hashing methods cannot be guaranteed when applied to few-bits hashing because of severe information loss. In this paper, we present a simple but effective unsupervised neural generative semantic hashing method with a focus on few-bits hashing. Our model is built upon variational autoencoder and represents each hash bit as a Bernoulli variable, which allows the model to be end-to-end trainable. To address the issue of information loss, we introduce a set of auxiliary implicit topic vectors. With the aid of these topic vectors, the generated hash codes are not only low-dimensional representations of the original texts but also capture their implicit topics. We conduct comprehensive experiments on four datasets. The results demonstrate that our approach achieves significant improvements over state-of-the-art semantic hashing methods in few-bits hashing. |
2020 | Deep Robust Multilevel Semantic Cross-modal Hashing | Song Ge, Zhao Jun, Tan Xiaoyang | Arxiv | Hashing based cross-modal retrieval has recently made significant progress. But straightforward embedding data from different modalities into a joint Hamming space will inevitably produce false codes due to the intrinsic modality discrepancy and noises. We present a novel Robust Multilevel Semantic Hashing (RMSH) for more accurate cross-modal retrieval. It seeks to preserve fine-grained similarity among data with rich semantics while explicitly require distances between dissimilar points to be larger than a specific value for strong robustness. For this we give an effective bound of this value based on the information coding-theoretic analysis and the above goals are embodied into a margin-adaptive triplet loss. Furthermore we introduce pseudo-codes via fusing multiple hash codes to explore seldom-seen semantics alleviating the sparsity problem of similarity information. Experiments on three benchmarks show the validity of the derived bounds and our method achieves state-of-the-art performance. |
2020 | An Indexing Scheme And Descriptor For 3D Object Retrieval Based On Local Shape Querying | Van Blokland Bart Iver, Theoharis Theoharis | Computers Graphics Volume | A binary descriptor indexing scheme based on Hamming distance called the Hamming tree for local shape queries is presented. A new binary clutter resistant descriptor named Quick Intersection Count Change Image (QUICCI) is also introduced. This local shape descriptor is extremely small and fast to compare. Additionally a novel distance function called Weighted Hamming applicable to QUICCI images is proposed for retrieval applications. The effectiveness of the indexing scheme and QUICCI is demonstrated on 828 million QUICCI images derived from the SHREC2017 dataset while the clutter resistance of QUICCI is shown using the clutterbox experiment. |
2019 | Pairwise Teacher-student Network For Semi-supervised Hashing | Zhang Shifeng, Li Jianmin, Zhang Bo | Arxiv | Hashing method maps similar high-dimensional data to binary hashcodes with smaller hamming distance and it has received broad attention due to its low storage cost and fast retrieval speed. Pairwise similarity is easily obtained and widely used for retrieval and most supervised hashing algorithms are carefully designed for the pairwise supervisions. As labeling all data pairs is difficult semi-supervised hashing is proposed which aims at learning efficient codes with limited labeled pairs and abundant unlabeled ones. Existing methods build graphs to capture the structure of dataset but they are not working well for complex data as the graph is built based on the data representations and determining the representations of complex data is difficult. In this paper we propose a novel teacher-student semi-supervised hashing framework in which the student is trained with the pairwise information produced by the teacher network. The network follows the smoothness assumption which achieves consistent distances for similar data pairs so that the retrieval results are similar for neighborhood queries. Experiments on large-scale datasets show that the proposed method reaches impressive gain over the supervised baselines and is superior to state-of-the-art semi-supervised hashing methods. |
2019 | SADIH Semantic-aware Discrete Hashing | Zhang Zheng, Xie Guo-sen, Li Yang, Li Sheng, Huang Zi | Arxiv | Due to its low storage cost and fast query speed hashing has been recognized to accomplish similarity search in large-scale multimedia retrieval applications. Particularly supervised hashing has recently received considerable research attention by leveraging the label information to preserve the pairwise similarities of data points in the Hamming space. However there still remain two crucial bottlenecks 1) the learning process of the full pairwise similarity preservation is computationally unaffordable and unscalable to deal with big data; 2) the available category information of data are not well-explored to learn discriminative hash functions. To overcome these challenges we propose a unified Semantic-Aware DIscrete Hashing (SADIH) framework which aims to directly embed the transformed semantic information into the asymmetric similarity approximation and discriminative hashing function learning. Specifically a semantic-aware latent embedding is introduced to asymmetrically preserve the full pairwise similarities while skillfully handle the cumbersome n times n pairwise similarity matrix. Meanwhile a semantic-aware autoencoder is developed to jointly preserve the data structures in the discriminative latent semantic space and perform data reconstruction. Moreover an efficient alternating optimization algorithm is proposed to solve the resulting discrete optimization problem. Extensive experimental results on multiple large-scale datasets demonstrate that our SADIH can clearly outperform the state-of-the-art baselines with the additional benefit of lower computational costs. |
2019 | Deep Incremental Hashing Network for Efficient Image Retrieval | Dayan Wu, Qi Dai, Jing Liu, Bo Li, Weiping Wang | CVPR | Hashing has shown great potential in large-scale image retrieval due to its storage and computation efficiency, especially the recent deep supervised hashing methods. To achieve promising performance, deep supervised hashing methods require a large amount of training data from different classes. However, when images of new categories emerge, existing deep hashing methods have to retrain the CNN model and generate hash codes for all the database images again, which is impractical for large-scale retrieval system. In this paper, we propose a novel deep hashing framework, called Deep Incremental Hashing Network (DIHN), for learning hash codes in an incremental manner. DIHN learns the hash codes for the new coming images directly, while keeping the old ones unchanged. Simultaneously, a deep hash function for query set is learned by preserving the similarities between training points. Extensive experiments on two widely used image retrieval benchmarks demonstrate that the proposed DIHN framework can significantly decrease the training time while keeping the state-of-the-art retrieval accuracy. |
2019 | Embarrassingly Simple Binary Representation Learning | Yuming Shen, Jie Qin,Jiaxin Chen, Li Liu, and Fan Zhu | ICCVW | Recent binary representation learning models usually require sophisticated binary optimization, similarity measure or even generative models as auxiliaries. However, one may wonder whether these non-trivial components are needed to formulate practical and effective hashing models. In this paper, we answer the above question by proposing an embarrassingly simple approach to binary representation learning. With a simple classification objective, our model only incorporates two additional fully-connected layers onto the top of an arbitrary backbone network, whilst complying with the binary constraints during training. The proposed model lower-bounds the Information Bottleneck (IB) between data samples and their semantics, and can be related to many recent `learning to hash’ paradigms. We show that, when properly designed, even such a simple network can generate effective binary codes, by fully exploring data semantics without any held-out alternating updating steps or auxiliary models. Experiments are conducted on conventional large-scale benchmarks, i.e., CIFAR-10, NUS-WIDE, and ImageNet, where the proposed simple model outperforms the state-of-the-art methods. |
2019 | Weakly-paired Cross-modal Hashing | Liu Xuanwu, Wang Jun, Yu Guoxian, Domeniconi Carlotta, Zhang Xiangliang | Arxiv | Hashing has been widely adopted for large-scale data retrieval in many domains due to its low storage cost and high retrieval speed. Existing cross-modal hashing methods optimistically assume that the correspondence between training samples across modalities are readily available. This assumption is unrealistic in practical applications. In addition these methods generally require the same number of samples across different modalities which restricts their flexibility. We propose a flexible cross-modal hashing approach (Flex-CMH) to learn effective hashing codes from weakly-paired data whose correspondence across modalities are partially (or even totally) unknown. FlexCMH first introduces a clustering-based matching strategy to explore the local structure of each cluster and thus to find the potential correspondence between clusters (and samples therein) across modalities. To reduce the impact of an incomplete correspondence it jointly optimizes in a unified objective function the potential correspondence the cross-modal hashing functions derived from the correspondence and a hashing quantitative loss. An alternative optimization technique is also proposed to coordinate the correspondence and hash functions and to reinforce the reciprocal effects of the two objectives. Experiments on publicly multi-modal datasets show that FlexCMH achieves significantly better results than state-of-the-art methods and it indeed offers a high degree of flexibility for practical cross-modal hashing tasks. |
2019 | Ranking-based Deep Cross-modal Hashing | Liu Xuanwu, Yu Guoxian, Domeniconi Carlotta, Wang Jun, Ren Yazhou, Guo Maozu | Arxiv | Cross-modal hashing has been receiving increasing interests for its low storage cost and fast query speed in multi-modal data retrievals. However most existing hashing methods are based on hand-crafted or raw level features of objects which may not be optimally compatible with the coding process. Besides these hashing methods are mainly designed to handle simple pairwise similarity. The complex multilevel ranking semantic structure of instances associated with multiple labels has not been well explored yet. In this paper we propose a ranking-based deep cross-modal hashing approach (RDCMH). RDCMH firstly uses the feature and label information of data to derive a semi-supervised semantic ranking list. Next to expand the semantic representation power of hand-crafted features RDCMH integrates the semantic ranking information into deep cross-modal hashing and jointly optimizes the compatible parameters of deep feature representations and of hashing functions. Experiments on real multi-modal datasets show that RDCMH outperforms other competitive baselines and achieves the state-of-the-art performance in cross-modal retrieval applications. |
2019 | Metric-learning Based Deep Hashing Network For Content Based Retrieval Of Remote Sensing Images | Roy Subhankar, Sangineto Enver, Demir Begüm, Sebe Nicu | Arxiv | Hashing methods have been recently found very effective in retrieval of remote sensing (RS) images due to their computational efficiency and fast search speed. The traditional hashing methods in RS usually exploit hand-crafted features to learn hash functions to obtain binary codes which can be insufficient to optimally represent the information content of RS images. To overcome this problem in this paper we introduce a metric-learning based hashing network which learns 1) a semantic-based metric space for effective feature representation; and 2) compact binary hash codes for fast archive search. Our network considers an interplay of multiple loss functions that allows to jointly learn a metric based semantic space facilitating similar images to be clustered together in that target space and at the same time producing compact final activations that lose negligible information when binarized. Experiments carried out on two benchmark RS archives point out that the proposed network significantly improves the retrieval performance under the same retrieval time when compared to the state-of-the-art hashing methods in RS. |
2019 | Variable-Length Quantization Strategy for Hashing | Yang Shi, Xiushan Nie, Xin Zhou, Xiaoming Xi, Yilong Yin | ICIP | Hashing is widely used to solve fast Approximate Nearest Neighbor (ANN) search problems, involves converting the original real-valued samples to binary-valued representations. The conventional quantization strategies, such as Single-Bit Quantization and Multi-Bit quantization, are considered ineffective, because of their serious information loss. To address this issue, we propose a novel variable-length quantization (VLQ) strategy for hashing. In the proposed VLQ technique, we divide all samples into different regions in each dimension firstly given the real-valued features of samples. Then we compute the dispersion degrees of these regions. Subsequently, we attempt to optimally assign different number of bits to each dimensions to obtain the minimum dispersion degree. Our experiments show that the VLQ strategy achieves not only superior performance over the state-of-the-art methods, but also has a faster retrieval speed on public datasets. |
2019 | Mutual Linear Regression-based Discrete Hashing | Liu Xingbo, Nie Xiushan, Yin Yilong | Arxiv | Label information is widely used in hashing methods because of its effectiveness of improving the precision. The existing hashing methods always use two different projections to represent the mutual regression between hash codes and class labels. In contrast to the existing methods we propose a novel learning-based hashing method termed stable supervised discrete hashing with mutual linear regression (S2DHMLR) in this study where only one stable projection is used to describe the linear correlation between hash codes and corresponding labels. To the best of our knowledge this strategy has not been used for hashing previously. In addition we further use a boosting strategy to improve the final performance of the proposed method without adding extra constraints and with little extra expenditure in terms of time and space. Extensive experiments conducted on three image benchmarks demonstrate the superior performance of the proposed method. |
2019 | Guided Similarity Separation For Image Retrieval | Chundi Liu, Guangwei Yu, Maksims Volkovs, Cheng Chang, Himanshu Rai, Junwei Ma, Satya Krishna Gorti | Neural Information Processing Systems | Despite recent progress in computer vision image retrieval remains a challenging open problem. Numerous variations such as view angle lighting and occlusion make it difficult to design models that are both robust and efficient. Many leading methods traverse the nearest neighbor graph to exploit higher order neighbor information and uncover the highly complex underlying manifold. In this work we propose a different approach where we leverage graph convolutional networks to directly encode neighbor information into image descriptors. We further leverage ideas from clustering and manifold learning and introduce an unsupervised loss based on pairwise separation of image similarities. Empirically we demonstrate that our model is able to successfully learn a new descriptor space that significantly improves retrieval accuracy while still allowing efficient inner product inference. Experiments on five public benchmarks show highly competitive performance with up to 2437; relative improvement in mAP over leading baselines. Full code for this work is available here https://github.com/layer6ai-labs/GSS.” |
2019 | Deep Triplet Quantization | Liu Bin, Cao Yue, Long Mingsheng, Wang Jianmin, Wang Jingdong | Arxiv | Deep hashing establishes efficient and effective image retrieval by end-to-end learning of deep representations and hash codes from similarity data. We present a compact coding solution focusing on deep learning to quantization approach that has shown superior performance over hashing solutions for similarity retrieval. We propose Deep Triplet Quantization (DTQ) a novel approach to learning deep quantization models from the similarity triplets. To enable more effective triplet training we design a new triplet selection approach Group Hard that randomly selects hard triplets in each image group. To generate compact binary codes we further apply a triplet quantization with weak orthogonality during triplet training. The quantization loss reduces the codebook redundancy and enhances the quantizability of deep representations through back-propagation. Extensive experiments demonstrate that DTQ can generate high-quality and compact binary codes which yields state-of-the-art image retrieval performance on three benchmark datasets NUS-WIDE CIFAR-10 and MS-COCO. |
2019 | Effective And Efficient Indexing In Cross-modal Hashing-based Datasets | Markchit Sarawut, Chiu Chih-yi | Arxiv | To overcome the barrier of storage and computation the hashing technique has been widely used for nearest neighbor search in multimedia retrieval applications recently. Particularly cross-modal retrieval that searches across different modalities becomes an active but challenging problem. Although dozens of cross-modal hashing algorithms are proposed to yield compact binary codes the exhaustive search is impractical for the real-time purpose and Hamming distance computation suffers inaccurate results. In this paper we propose a novel search method that utilizes a probability-based index scheme over binary hash codes in cross-modal retrieval. The proposed hash code indexing scheme exploits a few binary bits of the hash code as the index code. We construct an inverted index table based on index codes and train a neural network to improve the indexing accuracy and efficiency. Experiments are performed on two benchmark datasets for retrieval across image and text modalities where hash codes are generated by three cross-modal hashing methods. Results show the proposed method effectively boost the performance on these hash methods. |
2019 | Hierarchy Neighborhood Discriminative Hashing For An Unified View Of Single-label And Multi-label Image Retrieval | Ma Lei, Li Hongliang, Wu Qingbo, Meng Fanman, Ngan King Ngi | Arxiv | Recently deep supervised hashing methods have become popular for large-scale image retrieval task. To preserve the semantic similarity notion between examples they typically utilize the pairwise supervision or the triplet supervised information for hash learning. However these methods usually ignore the semantic class information which can help the improvement of the semantic discriminative ability of hash codes. In this paper we propose a novel hierarchy neighborhood discriminative hashing method. Specifically we construct a bipartite graph to build coarse semantic neighbourhood relationship between the sub-class feature centers and the embeddings features. Moreover we utilize the pairwise supervised information to construct the fined semantic neighbourhood relationship between embeddings features. Finally we propose a hierarchy neighborhood discriminative hashing loss to unify the single-label and multilabel image retrieval problem with a one-stream deep neural network architecture. Experimental results on two largescale datasets demonstrate that the proposed method can outperform the state-of-the-art hashing methods. |
2019 | Approximate Similarity Search Under Edit Distance Using Locality-sensitive Hashing | Mccauley Samuel | Arxiv | Edit distance similarity search also called approximate pattern matching is a fundamental problem with widespread database applications. The goal of the problem is to preprocess n strings of length d to quickly answer queries q of the form if there is a database string within edit distance r of q return a database string within edit distance cr of q. Previous approaches to this problem either rely on very large (superconstant) approximation ratios c or very small search radii r. Outside of a narrow parameter range these solutions are not competitive with trivially searching through all n strings. In this work give a simple and easy-to-implement hash function that can quickly answer queries for a wide range of parameters. Specifically our strategy can answer queries in time (tildeO)(d3^rn^1/c). The best known practical results require c (gg) r to achieve any correctness guarantee; meanwhile the best known theoretical results are very involved and difficult to implement and require query time at least 24^r. Our results significantly broaden the range of parameters for which we can achieve nontrivial bounds while retaining the practicality of a locality-sensitive hash function. We also show how to apply our ideas to the closely-related Approximate Nearest Neighbor problem for edit distance obtaining similar time bounds. |
2019 | Collaborative Quantization For Cross-modal Similarity Search | Zhang Ting, Wang Jingdong | Arxiv | Cross-modal similarity search is a problem about designing a search system supporting querying across content modalities e.g. using an image to search for texts or using a text to search for images. This paper presents a compact coding solution for efficient search with a focus on the quantization approach which has already shown the superior performance over the hashing solutions in the single-modal similarity search. We propose a cross-modal quantization approach which is among the early attempts to introduce quantization into cross-modal search. The major contribution lies in jointly learning the quantizers for both modalities through aligning the quantized representations for each pair of image and text belonging to a document. In addition our approach simultaneously learns the common space for both modalities in which quantization is conducted to enable efficient and effective search using the Euclidean distance computed in the common space with fast distance table lookup. Experimental results compared with several competitive algorithms over three benchmark datasets demonstrate that the proposed approach achieves the state-of-the-art performance. |
2019 | MoBoost: A Self-improvement Framework for Linear-based Hashing | Xingbo Liu, Xiushan Nie, Xiaoming Xi, Lei Zhu, Yilong Yin | CIKM | The linear model is commonly utilized in hashing methods owing to its efficiency. To obtain better accuracy, linear-based hashing methods focus on designing a generalized linear objective function with different constraints or penalty terms that consider neighborhood information. In this study, we propose a novel generalized framework called Model Boost (MoBoost), which can achieve the self-improvement of the linear-based hashing. The proposed MoBoost is used to improve model parameter optimization for linear-based hashing methods without adding new constraints or penalty terms. In the proposed MoBoost, given a linear-based hashing method, we first execute the method several times to get several different hash codes for training samples, and then combine these different hash codes into one set utilizing one novel fusion strategy. Based on this set of hash codes, we learn some new parameters for the linear hash function that can significantly improve accuracy. The proposed MoBoost can be generally adopted in existing linear-based hashing methods, achieving more precise and stable performance compared to the original methods while imposing negligible added expenditure in terms of time and space. Extensive experiments are performed based on three benchmark datasets, and the results demonstrate the superior performance of the proposed framework. |
2019 | Optimal Projection Guided Transfer Hashing For Image Retrieval | Liu Ji, Zhang Lei | Arxiv | Recently learning to hash has been widely studied for image retrieval thanks to the computation and storage efficiency of binary codes. For most existing learning to hash methods sufficient training images are required and used to learn precise hashing codes. However in some real-world applications there are not always sufficient training images in the domain of interest. In addition some existing supervised approaches need a amount of labeled data which is an expensive process in term of time label and human expertise. To handle such problems inspired by transfer learning we propose a simple yet effective unsupervised hashing method named Optimal Projection Guided Transfer Hashing (GTH) where we borrow the images of other different but related domain i.e. source domain to help learn precise hashing codes for the domain of interest i.e. target domain. Besides we propose to seek for the maximum likelihood estimation (MLE) solution of the hashing functions of target and source domains due to the domain gap. Furthermorean alternating optimization method is adopted to obtain the two projections of target and source domains such that the domain hashing disparity is reduced gradually. Extensive experiments on various benchmark databases verify that our method outperforms many state-of-the-art learning to hash methods. The implementation details are available at https://github.com/liuji93/GTH.” |
2019 | Cross-modal Zero-shot Hashing | Liu Xuanwu, Li Zhao, Wang Jun, Yu Guoxian, Domeniconi Carlotta, Zhang Xiangliang | Arxiv | Hashing has been widely studied for big data retrieval due to its low storage cost and fast query speed. Zero-shot hashing (ZSH) aims to learn a hashing model that is trained using only samples from seen categories but can generalize well to samples of unseen categories. ZSH generally uses category attributes to seek a semantic embedding space to transfer knowledge from seen categories to unseen ones. As a result it may perform poorly when labeled data are insufficient. ZSH methods are mainly designed for single-modality data which prevents their application to the widely spread multi-modal data. On the other hand existing cross-modal hashing solutions assume that all the modalities share the same category labels while in practice the labels of different data modalities may be different. To address these issues we propose a general Cross-modal Zero-shot Hashing (CZHash) solution to effectively leverage unlabeled and labeled multi-modality data with different label spaces. CZHash first quantifies the composite similarity between instances using label and feature information. It then defines an objective function to achieve deep feature learning compatible with the composite similarity preserving category attribute space learning and hashing coding function learning. CZHash further introduces an alternative optimization procedure to jointly optimize these learning objectives. Experiments on benchmark multi-modal datasets show that CZHash significantly outperforms related representative hashing approaches both on effectiveness and adaptability. |
2019 | Query-adaptive Hash Code Ranking For Large-scale Multi-view Visual Search | Liu Xianglong, Huang Lei, Deng Cheng, Lang Bo, Tao Dacheng | Arxiv | Hash based nearest neighbor search has become attractive in many applications. However the quantization in hashing usually degenerates the discriminative power when using Hamming distance ranking. Besides for large-scale visual search existing hashing methods cannot directly support the efficient search over the data with multiple sources and while the literature has shown that adaptively incorporating complementary information from diverse sources or views can significantly boost the search performance. To address the problems this paper proposes a novel and generic approach to building multiple hash tables with multiple views and generating fine-grained ranking results at bitwise and tablewise levels. For each hash table a query-adaptive bitwise weighting is introduced to alleviate the quantization loss by simultaneously exploiting the quality of hash functions and their complement for nearest neighbor search. From the tablewise aspect multiple hash tables are built for different data views as a joint index over which a query-specific rank fusion is proposed to rerank all results from the bitwise ranking by diffusing in a graph. Comprehensive experiments on image search over three well-known benchmarks show that the proposed method achieves up to 17.1137; and 20.2837; performance gains on single and multiple table search over state-of-the-art methods. |
2019 | Towards Optimal Discrete Online Hashing With Balanced Similarity | Lin Mingbao, Ji Rongrong, Liu Hong, Sun Xiaoshuai, Wu Yongjian, Wu Yunsheng | Arxiv | When facing large-scale image datasets online hashing serves as a promising solution for online retrieval and prediction tasks. It encodes the online streaming data into compact binary codes and simultaneously updates the hash functions to renew codes of the existing dataset. To this end the existing methods update hash functions solely based on the new data batch without investigating the correlation between such new data and the existing dataset. In addition existing works update the hash functions using a relaxation process in its corresponding approximated continuous space. And it remains as an open problem to directly apply discrete optimizations in online hashing. In this paper we propose a novel supervised online hashing method termed Balanced Similarity for Online Discrete Hashing (BSODH) to solve the above problems in a unified framework. BSODH employs a well-designed hashing algorithm to preserve the similarity between the streaming data and the existing dataset via an asymmetric graph regularization. We further identify the data-imbalance problem brought by the constructed asymmetric graph which restricts the application of discrete optimization in our problem. Therefore a novel balanced similarity is further proposed which uses two equilibrium factors to balance the similar and dissimilar weights and eventually enables the usage of discrete optimizations. Extensive experiments conducted on three widely-used benchmarks demonstrate the advantages of the proposed method over the state-of-the-art methods. |
2019 | Supervised Online Hashing Via Similarity Distribution Learning | Lin Mingbao, Ji Rongrong, Chen Shen, Zheng Feng, Sun Xiaoshuai, Zhang Baochang, Cao Liujuan, Guo Guodong, Huang Feiyue | Arxiv | Online hashing has attracted extensive research attention when facing streaming data. Most online hashing methods learning binary codes based on pairwise similarities of training instances fail to capture the semantic relationship and suffer from a poor generalization in large-scale applications due to large variations. In this paper we propose to model the similarity distributions between the input data and the hashing codes upon which a novel supervised online hashing method dubbed as Similarity Distribution based Online Hashing (SDOH) is proposed to keep the intrinsic semantic relationship in the produced Hamming space. Specifically we first transform the discrete similarity matrix into a probability matrix via a Gaussian-based normalization to address the extremely imbalanced distribution issue. And then we introduce a scaling Student t-distribution to solve the challenging initialization problem and efficiently bridge the gap between the known and unknown distributions. Lastly we align the two distributions via minimizing the Kullback-Leibler divergence (KL-diverence) with stochastic gradient descent (SGD) by which an intuitive similarity constraint is imposed to update hashing model on the new streaming data with a powerful generalizing ability to the past data. Extensive experiments on three widely-used benchmarks validate the superiority of the proposed SDOH over the state-of-the-art methods in the online retrieval task. |
2019 | Distillhash Unsupervised Deep Hashing By Distilling Data Pairs | Yang Erkun, Liu Tongliang, Deng Cheng, Liu Wei, Tao Dacheng | Arxiv | Due to the high storage and search efficiency hashing has become prevalent for large-scale similarity search. Particularly deep hashing methods have greatly improved the search performance under supervised scenarios. In contrast unsupervised deep hashing models can hardly achieve satisfactory performance due to the lack of reliable supervisory similarity signals. To address this issue we propose a novel deep unsupervised hashing model dubbed DistillHash which can learn a distilled data set consisted of data pairs which have confidence similarity signals. Specifically we investigate the relationship between the initial noisy similarity signals learned from local structures and the semantic similarity labels assigned by a Bayes optimal classifier. We show that under a mild assumption some data pairs of which labels are consistent with those assigned by the Bayes optimal classifier can be potentially distilled. Inspired by this fact we design a simple yet effective strategy to distill data pairs automatically and further adopt a Bayesian learning framework to learn hash functions from the distilled data set. Extensive experimental results on three widely used benchmark datasets show that the proposed DistillHash consistently accomplishes the state-of-the-art search performance. |
2019 | Fusion-supervised Deep Cross-modal Hashing | Wang Li, Zhu Lei, Yu En, Sun Jiande, Zhang Huaxiang | Arxiv | Deep hashing has recently received attention in cross-modal retrieval for its impressive advantages. However existing hashing methods for cross-modal retrieval cannot fully capture the heterogeneous multi-modal correlation and exploit the semantic information. In this paper we propose a novel (emph)Fusion-supervised Deep Cross-modal Hashing (FDCH) approach. Firstly FDCH learns unified binary codes through a fusion hash network with paired samples as input which effectively enhances the modeling of the correlation of heterogeneous multi-modal data. Then these high-quality unified hash codes further supervise the training of the modality-specific hash networks for encoding out-of-sample queries. Meanwhile both pair-wise similarity information and classification information are embedded in the hash networks under one stream framework which simultaneously preserves cross-modal similarity and keeps semantic consistency. Experimental results on two benchmark datasets demonstrate the state-of-the-art performance of FDCH. |
2019 | Semantic Hierarchy Preserving Deep Hashing For Large-scale Image Retrieval | Zhang Ming, Zhe Xuefei, Ou-yang Le, Chen Shifeng, Yan Hong | Arxiv | Deep hashing models have been proposed as an efficient method for large-scale similarity search. However most existing deep hashing methods only utilize fine-level labels for training while ignoring the natural semantic hierarchy structure. This paper presents an effective method that preserves the classwise similarity of full-level semantic hierarchy for large-scale image retrieval. Experiments on two benchmark datasets show that our method helps improve the fine-level retrieval performance. Moreover with the help of the semantic hierarchy it can produce significantly better binary codes for hierarchical retrieval which indicates its potential of providing more user-desired retrieval results. |
2019 | Re-randomized Densification For One Permutation Hashing And Bin-wise Consistent Weighted Sampling | Ping Li, Xiaoyun Li, Cun-hui Zhang | Neural Information Processing Systems | Jaccard similarity is widely used as a distance measure in many machine learning and search applications. Typically hashing methods are essential for the use of Jaccard similarity to be practical in large-scale settings. For hashing binary (0/1) data the idea of one permutation hashing (OPH) with densification significantly accelerates traditional minwise hashing algorithms while providing unbiased and accurate estimates. In this paper we propose a strategy named re-randomization in the process of densification that could achieve the smallest variance among all densification schemes. The success of this idea naturally inspires us to generalize one permutation hashing to weighted (non-binary) data which results in the socalled bin-wise consistent weighted sampling (BCWS) algorithm. We analyze the behavior of BCWS and compare it with a recent alternative. Extensive experiments on various datasets illustrates the effectiveness of our proposed methods. |
2019 | Push For Quantization Deep Fisher Hashing | Li Yunqiang, Pei Wenjie, Zha Yufei, Van Gemert Jan | Arxiv | Current massive datasets demand light-weight access for analysis. Discrete hashing methods are thus beneficial because they map high-dimensional data to compact binary codes that are efficient to store and process while preserving semantic similarity. To optimize powerful deep learning methods for image hashing gradient-based methods are required. Binary codes however are discrete and thus have no continuous derivatives. Relaxing the problem by solving it in a continuous space and then quantizing the solution is not guaranteed to yield separable binary codes. The quantization needs to be included in the optimization. In this paper we push for quantization We optimize maximum class separability in the binary space. We introduce a margin on distances between dissimilar image pairs as measured in the binary space. In addition to pair-wise distances we draw inspiration from Fishers Linear Discriminant Analysis (Fisher LDA) to maximize the binary distances between classes and at the same time minimize the binary distance of images within the same class. Experiments on CIFAR-10 NUS-WIDE and ImageNet100 demonstrate compact codes comparing favorably to the current state of the art. |
2019 | Semi-supervised Deep Quantization for Cross-modal Search | Xin Wang, Wenwu Zhu, Chenghao Liu | MM | The problem of cross-modal similarity search, which aims at making efficient and accurate queries across multiple domains, has become a significant and important research topic. Composite quantization, a compact coding solution superior to hashing techniques, has shown its effectiveness for similarity search. However, most existing works utilizing composite quantization to search multi-domain content only consider either pairwise similarity information or class label information across different domains, which fails to tackle the semi-supervised problem in composite quantization. In this paper, we address the semi-supervised quantization problem by considering: (i) pairwise similarity information (without class label information) across different domains, which captures the intra-document relation, (ii) cross-domain data with class label which can help capture inter-document relation, and (iii) cross-domain data with neither pairwise similarity nor class label which enables the full use of abundant unlabelled information. To the best of our knowledge, we are the first to consider both supervised information (pairwise similarity + class label) and unsupervised information (neither pairwise similarity nor class label) simultaneously in composite quantization. A challenging problem arises: how can we jointly handle these three sorts of information across multiple domains in an efficient way? To tackle this challenge, we propose a novel semi-supervised deep quantization (SSDQ) model that takes both supervised and unsupervised information into account. The proposed SSDQ model is capable of incorporating the above three kinds of information into one single framework when utilizing composite quantization for accurate and efficient queries across different domains. More specifically, we employ a modified deep autoencoder for better latent representation and formulate pairwise similarity loss, supervised quantization loss as well as unsupervised distribution match loss to handle all three types of information. The extensive experiments demonstrate the significant improvement of SSDQ over several state-of-the-art methods on various datasets. |
2019 | Coupled Cyclegan Unsupervised Hashing Network For Cross-modal Retrieval | Li Chao, Deng Cheng, Wang Lei, Xie De, Liu Xianglong | Arxiv | In recent years hashing has attracted more and more attention owing to its superior capacity of low storage cost and high query efficiency in large-scale cross-modal retrieval. Benefiting from deep leaning continuously compelling results in cross-modal retrieval community have been achieved. However existing deep cross-modal hashing methods either rely on amounts of labeled information or have no ability to learn an accuracy correlation between different modalities. In this paper we proposed Unsupervised coupled Cycle generative adversarial Hashing networks (UCH) for cross-modal retrieval where outer-cycle network is used to learn powerful common representation and inner-cycle network is explained to generate reliable hash codes. Specifically our proposed UCH seamlessly couples these two networks with generative adversarial mechanism which can be optimized simultaneously to learn representation and hash codes. Extensive experiments on three popular benchmark datasets show that the proposed UCH outperforms the state-of-the-art unsupervised cross-modal hashing methods. |
2019 | Deep Multi-index Hashing For Person Re-identification | Li Ming-wei, Jiang Qing-yuan, Li Wu-jun | Arxiv | Traditional person re-identification (ReID) methods typically represent person images as real-valued features which makes ReID inefficient when the gallery set is extremely large. Recently some hashing methods have been proposed to make ReID more efficient. However these hashing methods will deteriorate the accuracy in general and the efficiency of them is still not high enough. In this paper we propose a novel hashing method called deep multi-index hashing (DMIH) to improve both efficiency and accuracy for ReID. DMIH seamlessly integrates multi-index hashing and multi-branch based networks into the same framework. Furthermore a novel block-wise multi-index hashing table construction approach and a search-aware multi-index (SAMI) loss are proposed in DMIH to improve the search efficiency. Experiments on three widely used datasets show that DMIH can outperform other state-of-the-art baselines including both hashing methods and real-valued methods in terms of both efficiency and accuracy. |
2019 | Neighborhood Preserving Hashing for Scalable Video Retrieval | Shuyan Li, Zhixiang Chen, Jiwen Lu, Xiu Li, and Jie Zhou | ICCV | In this paper, we propose a Neighborhood Preserving Hashing (NPH) method for scalable video retrieval in an unsupervised manner. Unlike most existing deep video hashing methods which indiscriminately compress an entire video into a binary code, we embed the spatial-temporal neighborhood information into the encoding network such that the neighborhood-relevant visual content of a video can be preferentially encoded into a binary code under the guidance of the neighborhood information. Specifically, we propose a neighborhood attention mechanism which focuses on partial useful content of each input frame conditioned on the neighborhood information. We then integrate the neighborhood attention mechanism into an RNN-based reconstruction scheme to encourage the binary codes to capture the spatial-temporal structure in a video which is consistent with that in the neighborhood. As a consequence, the learned hashing functions can map similar videos to similar binary codes. Extensive experiments on three widely-used benchmark datasets validate the effectiveness of our proposed approach. |
2019 | Video Segment Copy Detection Using Memory Constrained Hierarchical Batch-normalized LSTM Autoencoder | Krishna Arjun, Ibrahim A S Akil Arif | Arxiv | In this report we introduce a video hashing method for scalable video segment copy detection. The objective of video segment copy detection is to find the video (s) present in a large database one of whose segments (cropped in time) is a (transformed) copy of the given query video. This transformation may be temporal (for example frame dropping change in frame rate) or spatial (brightness and contrast change addition of noise etc.) in nature although the primary focus of this report is detecting temporal attacks. The video hashing method proposed by us uses a deep learning neural network to learn variable length binary hash codes for the entire video considering both temporal and spatial features into account. This is in contrast to most existing video hashing methods as they use conventional image hashing techniques to obtain hash codes for a video after extracting features for every frame or certain key frames in which case the temporal information present in the video is not exploited. Our hashing method is specifically resilient to time cropping making it extremely useful in video segment copy detection. Experimental results obtained on the large augmented dataset consisting of around 25000 videos with segment copies demonstrate the efficacy of our proposed video hashing method. |
2019 | Deep Hashing by Discriminating Hard Examples | Cheng Yan, Guansong Pang, Xiao Bai, Chunhua Shen, Jun Zhou, Edwin Hancock | MM | This paper tackles a rarely explored but critical problem within learning to hash, i.e., to learn hash codes that effectively discriminate hard similar and dissimilar examples, to empower large-scale image retrieval. Hard similar examples refer to image pairs from the same semantic class that demonstrate some shared appearance but have different fine-grained appearance. Hard dissimilar examples are image pairs that come from different semantic classes but exhibit similar appearance. These hard examples generally have a small distance due to the shared appearance. Therefore, effective encoding of the hard examples can well discriminate the relevant images within a small Hamming distance, enabling more accurate retrieval in the top-ranked returned images. However, most existing hashing methods cannot capture this key information as their optimization is dominated byeasy examples, i.e., distant similar/dissimilar pairs that share no or limited appearance. To address this problem, we introduce a novel Gamma distribution-enabled and symmetric Kullback-Leibler divergence-based loss, which is dubbed dual hinge loss because it works similarly as imposing two smoothed hinge losses on the respective similar and dissimilar pairs. Specifically, the loss enforces exponentially variant penalization on the hard similar (dissimilar) examples to emphasize and learn their fine-grained difference. It meanwhile imposes a bounding penalization on easy similar (dissimilar) examples to prevent the dominance of the easy examples in the optimization while preserving the high-level similarity (dissimilarity). This enables our model to well encode the key information carried by both easy and hard examples. Extensive empirical results on three widely-used image retrieval datasets show that (i) our method consistently and substantially outperforms state-of-the-art competing methods using hash codes of the same length and (ii) our method can use significantly (e.g., 50%-75%) shorter hash codes to perform substantially better than, or comparably well to, the competing methods. |
2019 | Nearest Neighbor Search-based Bitwise Source Separation Using Discriminant Winner-take-all Hashing | Kim Sunwoo, Kim Minje | Arxiv | We propose an iteration-free source separation algorithm based on Winner-Take-All (WTA) hash codes which is a faster yet accurate alternative to a complex machine learning model for single-channel source separation in a resource-constrained environment. We first generate random permutations with WTA hashing to encode the shape of the multidimensional audio spectrum to a reduced bitstring representation. A nearest neighbor search on the hash codes of an incoming noisy spectrum as the query string results in the closest matches among the hashed mixture spectra. Using the indices of the matching frames we obtain the corresponding ideal binary mask vectors for denoising. Since both the training data and the search operation are bitwise the procedure can be done efficiently in hardware implementations. Experimental results show that the WTA hash codes are discriminant and provide an affordable dictionary search mechanism that leads to a competent performance compared to a comprehensive model and oracle masking. |
2019 | Lock-free Hopscotch Hashing | Kelly Robert, Pearlmutter Barak A., Maguire Phil | Arxiv | In this paper we present a lock-free version of Hopscotch Hashing. Hopscotch Hashing is an open addressing algorithm originally proposed by Herlihy Shavit and Tzafrir which is known for fast performance and excellent cache locality. The algorithm allows users of the table to skip or jump over irrelevant entries allowing quick search insertion and removal of entries. Unlike traditional linear probing Hopscotch Hashing is capable of operating under a high load factor as probe counts remain small. Our lock-free version improves on both speed cache locality and progress guarantees of the original being a chimera of two concurrent hash tables. We compare our data structure to various other lock-free and blocking hashing algorithms and show that its performance is in many cases superior to existing strategies. The proposed lock-free version overcomes some of the drawbacks associated with the original blocking version leading to a substantial boost in scalability while maintaining attractive features like physical deletion or probe-chain compression. |
2019 | Supervised Quantization For Similarity Search | Wang Xiaojuan, Zhang Ting, Q Guo-jun, Tang Jinhui, Wang Jingdong | Arxiv | In this paper we address the problem of searching for semantically similar images from a large database. We present a compact coding approach supervised quantization. Our approach simultaneously learns feature selection that linearly transforms the database points into a low-dimensional discriminative subspace and quantizes the data points in the transformed space. The optimization criterion is that the quantized points not only approximate the transformed points accurately but also are semantically separable the points belonging to a class lie in a cluster that is not overlapped with other clusters corresponding to other classes which is formulated as a classification problem. The experiments on several standard datasets show the superiority of our approach over the state-of-the art supervised hashing and unsupervised quantization algorithms. |
2019 | Online Multi-modal Hashing with Dynamic Query-adaption | Xu Lu, Lei Zhu, Zhiyong Cheng, Liqiang Nie and Huaxiang Zhang | SIGIR | Multi-modal hashing is an effective technique to support large-scale multimedia retrieval, due to its capability of encoding heterogeneous multi-modal features into compact and similarity-preserving binary codes. Although great progress has been achieved so far, existing methods still suffer from several problems, including: 1) All existing methods simply adopt fixed modality combination weights in online hashing process to generate the query hash codes. This strategy cannot adaptively capture the variations of different queries. 2) They either suffer from insufficient semantics (for unsupervised methods) or require high computation and storage cost (for the supervised methods, which rely on pair-wise semantic matrix). 3) They solve the hash codes with relaxed optimization strategy or bit-by-bit discrete optimization, which results in significant quantization loss or consumes considerable computation time. To address the above limitations, in this paper, we propose an Online Multi-modal Hashing with Dynamic Query-adaption (OMH-DQ) method in a novel fashion. Specifically, a self-weighted fusion strategy is designed to adaptively preserve the multi-modal feature information into hash codes by exploiting their complementarity. The hash codes are learned with the supervision of pair-wise semantic labels to enhance their discriminative capability, while avoiding the challenging symmetric similarity matrix factorization. Under such learning framework, the binary hash codes can be directly obtained with efficient operations and without quantization errors. Accordingly, our method can benefit from the semantic labels, and simultaneously, avoid the high computation complexity. Moreover, to accurately capture the query variations, at the online retrieval stage, we design a parameter-free online hashing module which can adaptively learn the query hash codes according to the dynamic query contents. Extensive experiments demonstrate the state-of-the-art performance of the proposed approach from various aspects. |
2019 | PDH Probabilistic Deep Hashing Based On MAP Estimation Of Hamming Distance | Kaga Yosuke, Fujio Masakazu, Takahashi Kenta, Ohki Tetsushi, Nishigaki Masakatsu | With the growth of image on the web research on hashing which enables high-speed image retrieval has been actively studied. In recent years various hashing methods based on deep neural networks have been proposed and achieved higher precision than the other hashing methods. In these methods multiple losses for hash codes and the parameters of neural networks are defined. They generate hash codes that minimize the weighted sum of the losses. Therefore an expert has to tune the weights for the losses heuristically and the probabilistic optimality of the loss function cannot be explained. In order to generate explainable hash codes without weight tuning we theoretically derive a single loss function with no hyperparameters for the hash code from the probability distribution of the images. By generating hash codes that minimize this loss function highly accurate image retrieval with probabilistic optimality is performed. We evaluate the performance of hashing using MNIST CIFAR-10 SVHN and show that the proposed method outperforms the state-of-the-art hashing methods. |
|
2019 | b-bit Sketch Trie Scalable Similarity Search On Integer Sketches | Kanda Shunsuke, Tabei Yasuo | Arxiv | Recently randomly mapping vectorial data to strings of discrete symbols (i.e. sketches) for fast and space-efficient similarity searches has become popular. Such random mapping is called similarity-preserving hashing and approximates a similarity metric by using the Hamming distance. Although many efficient similarity searches have been proposed most of them are designed for binary sketches. Similarity searches on integer sketches are in their infancy. In this paper we present a novel space-efficient trie named b-bit sketch trie on integer sketches for scalable similarity searches by leveraging the idea behind succinct data structures (i.e. space-efficient data structures while supporting various data operations in the compressed format) and a favorable property of integer sketches as fixed-length strings. Our experimental results obtained using real-world datasets show that a trie-based index is built from integer sketches and efficiently performs similarity searches on the index by pruning useless portions of the search space which greatly improves the search time and space-efficiency of the similarity search. The experimental results show that our similarity search is at most one order of magnitude faster than state-of-the-art similarity searches. Besides our method needs only 10 GiB of memory on a billion-scale database while state-of-the-art similarity searches need 29 GiB of memory. |
2019 | Maximum-Margin Hamming Hashing | Rong Kang, Yue Cao, Mingsheng Long (B), Jianmin Wang, and Philip S. Yu | ICCV | Deep hashing enables computation and memory efficient image search through end-to-end learning of feature representations and binary codes. While linear scan over binary hash codes is more efficient than over the high-dimensional representations, its linear-time complexity is still unacceptable for very large databases. Hamming space retrieval enables constant-time search through hash lookups, where for each query, there is a Hamming ball centered at the query and the data points within the ball are returned as relevant. Since inside the Hamming ball implies retrievable while outside irretrievable, it is crucial to explicitly characterize the Hamming ball. The main idea of this work is to directly embody the Hamming radius into the loss functions, leading to Maximum-Margin Hamming Hashing (MMHH), a new model specifically optimized for Hamming space retrieval. We introduce a max-margin t-distribution loss, where the t-distribution concentrates more similar data points to be within the Hamming ball, and the margin characterizes the Hamming radius such that less penalization is applied to similar data points within the Hamming ball. The loss function also introduces robustness to data noise, where the similarity supervision may be inaccurate in practical problems. The model is trained end-to-end using a new semi-batch optimization algorithm tailored to extremely imbalanced data. Our method yields state-of-the-art results on four datasets and shows superior performance on noisy data. |
2019 | Note On Distance Matrix Hashing | Junussov I. A. | Arxiv | Hashing algorithm of dynamical set of distances is described. Proposed hashing function is residual. Data structure which implementation accelerates computations is presented |
2019 | Simultaneous Region Localization And Hash Coding For Fine-grained Image Retrieval | Zeng Haien, Lai Hanjiang, Yin Jian | Arxiv | Fine-grained image hashing is a challenging problem due to the difficulties of discriminative region localization and hash code generation. Most existing deep hashing approaches solve the two tasks independently. While these two tasks are correlated and can reinforce each other. In this paper we propose a deep fine-grained hashing to simultaneously localize the discriminative regions and generate the efficient binary codes. The proposed approach consists of a region localization module and a hash coding module. The region localization module aims to provide informative regions to the hash coding module. The hash coding module aims to generate effective binary codes and give feedback for learning better localizer. Moreover to better capture subtle differences multi-scale regions at different layers are learned without the need of bounding-box/part annotations. Extensive experiments are conducted on two public benchmark fine-grained datasets. The results demonstrate significant improvements in the performance of our method relative to other fine-grained hashing algorithms. |
2019 | Deep Semantic Multimodal Hashing Network For Scalable Image-text And Video-text Retrievals | Jin Lu, Li Zechao, Tang Jinhui | Arxiv | Hashing has been widely applied to multimodal retrieval on large-scale multimedia data due to its efficiency in computation and storage. In this article we propose a novel deep semantic multimodal hashing network (DSMHN) for scalable image-text and video-text retrieval. The proposed deep hashing framework leverages 2-D convolutional neural networks (CNN) as the backbone network to capture the spatial information for image-text retrieval while the 3-D CNN as the backbone network to capture the spatial and temporal information for video-text retrieval. In the DSMHN two sets of modality-specific hash functions are jointly learned by explicitly preserving both intermodality similarities and intramodality semantic labels. Specifically with the assumption that the learned hash codes should be optimal for the classification task two stream networks are jointly trained to learn the hash functions by embedding the semantic labels on the resultant hash codes. Moreover a unified deep multimodal hashing framework is proposed to learn compact and high-quality hash codes by exploiting the feature representation learning intermodality similarity-preserving learning semantic label-preserving learning and hash function learning with different types of loss functions simultaneously. The proposed DSMHN method is a generic and scalable deep hashing framework for both image-text and video-text retrievals which can be flexibly integrated with different types of loss functions. We conduct extensive experiments for both single modal- and cross-modal-retrieval tasks on four widely used multimodal-retrieval data sets. Experimental results on both image-text- and video-text-retrieval tasks demonstrate that the DSMHN significantly outperforms the state-of-the-art methods. |
2019 | On The Evaluation Metric For Hashing | Jiang Qing-yuan, Li Ming-wei, Li Wu-jun | Arxiv | Due to its low storage cost and fast query speed hashing has been widely used for large-scale approximate nearest neighbor (ANN) search. Bucket search also called hash lookup can achieve fast query speed with a sub-linear time cost based on the inverted index table constructed from hash codes. Many metrics have been adopted to evaluate hashing algorithms. However all existing metrics are improper to evaluate the hash codes for bucket search. On one hand all existing metrics ignore the retrieval time cost which is an important factor reflecting the performance of search. On the other hand some of them such as mean average precision (MAP) suffer from the uncertainty problem as the ranked list is based on integer-valued Hamming distance and are insensitive to Hamming radius as these metrics only depend on relative Hamming distance. Other metrics such as precision at Hamming radius R fail to evaluate global performance as these metrics only depend on one specific Hamming radius. In this paper we first point out the problems of existing metrics which have been ignored by the hashing community and then propose a novel evaluation metric called radius aware mean average precision (RAMAP) to evaluate hash codes for bucket search. Furthermore two coding strategies are also proposed to qualitatively show the problems of existing metrics. Experiments demonstrate that our proposed RAMAP can provide more proper evaluation than existing metrics. |
2019 | End-to-end Efficient Representation Learning Via Cascading Combinatorial Optimization | Jeong Yeonwoo, Kim Yoonsung, Song Hyun Oh | Arxiv | We develop hierarchically quantized efficient embedding representations for similarity-based search and show that this representation provides not only the state of the art performance on the search accuracy but also provides several orders of speed up during inference. The idea is to hierarchically quantize the representation so that the quantization granularity is greatly increased while maintaining the accuracy and keeping the computational complexity low. We also show that the problem of finding the optimal sparse compound hash code respecting the hierarchical structure can be optimized in polynomial time via minimum cost flow in an equivalent flow network. This allows us to train the method end-to-end in a mini-batch stochastic gradient descent setting. Our experiments on Cifar100 and ImageNet datasets show the state of the art search accuracy while providing several orders of magnitude search speedup respectively over exhaustive linear search over the dataset. |
2019 | Graph-based Multi-view Binary Learning For Image Clustering | Jiang Guangqi, Wang Huibing, Peng Jinjia, Chen Dongyan, Fu Xianping | Arxiv | Hashing techniques also known as binary code learning have recently gained increasing attention in large-scale data analysis and storage. Generally most existing hash clustering methods are single-view ones which lack complete structure or complementary information from multiple views. For cluster tasks abundant prior researches mainly focus on learning discrete hash code while few works take original data structure into consideration. To address these problems we propose a novel binary code algorithm for clustering which adopts graph embedding to preserve the original data structure called (Graph-based Multi-view Binary Learning) GMBL in this paper. GMBL mainly focuses on encoding the information of multiple views into a compact binary code which explores complementary information from multiple views. In particular in order to maintain the graph-based structure of the original data we adopt a Laplacian matrix to preserve the local linear relationship of the data and map it to the Hamming space. Considering different views have distinctive contributions to the final clustering results GMBL adopts a strategy of automatically assign weights for each view to better guide the clustering. Finally An alternating iterative optimization method is adopted to optimize discrete binary codes directly instead of relaxing the binary constraint in two steps. Experiments on five public datasets demonstrate the superiority of our proposed method compared with previous approaches in terms of clustering performance. |
2019 | Deephashing Using Tripletloss | James Jithin | Arxiv | Hashing is one of the most efficient techniques for approximate nearest neighbour search for large scale image retrieval. Most of the techniques are based on hand-engineered features and do not give optimal results all the time. Deep Convolutional Neural Networks have proven to generate very effective representation of images that are used for various computer vision tasks and inspired by this there have been several Deep Hashing models like Wang et al. (2016) have been proposed. These models train on the triplet loss function which can be used to train models with superior representation capabilities. Taking the latest advancements in training using the triplet loss I propose new techniques that help the Deep Hash-ing models train more faster and efficiently. Experiment result1show that using the more efficient techniques for training on the triplet loss we have obtained a 537;percent improvement in our model compared to the original work of Wang et al.(2016). Using a larger model and more training data we can drastically improve the performance using the techniques we propose |
2019 | Deep Collaborative Discrete Hashing With Semantic-invariant Structure | Wang Zijian, Zhang Zheng, Luo Yadan, Huang Zi | SIGIR | Existing deep hashing approaches fail to fully explore semantic correlations and neglect the effect of linguistic context on visual attention learning leading to inferior performance. This paper proposes a dual-stream learning framework dubbed Deep Collaborative Discrete Hashing (DCDH) which constructs a discriminative common discrete space by collaboratively incorporating the shared and individual semantics deduced from visual features and semantic labels. Specifically the context-aware representations are generated by employing the outer product of visual embeddings and semantic encodings. Moreover we reconstruct the labels and introduce the focal loss to take advantage of frequent and rare concepts. The common binary code space is built on the joint learning of the visual representations attended by language the semantic-invariant structure construction and the label distribution correction. Extensive experiments demonstrate the superiority of our method. |
2019 | Similarity Problems In High Dimensions | Sivertsen Johan Von Tangen | Arxiv | The main contribution of this dissertation is the introduction of new or improved approximation algorithms and data structures for several similarity search problems. We examine the furthest neighbor query the annulus query distance sensitive membership nearest neighbor preserving embeddings and set similarity queries in the large-scale high-dimensional setting. |
2019 | Deep Hashing Learning For Visual And Semantic Retrieval Of Remote Sensing Images | Song Weiwei, Li Shutao, Benediktsson Jon Atli | Arxiv | Driven by the urgent demand for managing remote sensing big data large-scale remote sensing image retrieval (RSIR) attracts increasing attention in the remote sensing field. In general existing retrieval methods can be regarded as visual-based retrieval approaches which search and return a set of similar images from a database to a given query image. Although retrieval methods have achieved great success there is still a question that needs to be responded to Can we obtain the accurate semantic labels of the returned similar images to further help analyzing and processing imagery Inspired by the above question in this paper we redefine the image retrieval problem as visual and semantic retrieval of images. Specifically we propose a novel deep hashing convolutional neural network (DHCNN) to simultaneously retrieve the similar images and classify their semantic labels in a unified framework. In more detail a convolutional neural network (CNN) is used to extract high-dimensional deep features. Then a hash layer is perfectly inserted into the network to transfer the deep features into compact hash codes. In addition a fully connected layer with a softmax function is performed on hash layer to generate class distribution. Finally a loss function is elaborately designed to simultaneously consider the label loss of each image and similarity loss of pairs of images. Experimental results on two remote sensing datasets demonstrate that the proposed method achieves the state-of-art retrieval and classification performance. |
2019 | Efficient Bitmap-based Indexing And Retrieval Of Similarity Search Image Queries | Jafari Omid, Nagarkar Parth, Montaño Jonathan | Finding similar images is a necessary operation in many multimedia applications. Images are often represented and stored as a set of high-dimensional features which are extracted using localized feature extraction algorithms. Locality Sensitive Hashing is one of the most popular approximate processing techniques for finding similar points in high-dimensional spaces. Locality Sensitive Hashing (LSH) and its variants are designed to find similar points but they are not designed to find objects (such as images which are made up of a collection of points) efficiently. In this paper we propose an index structure Bitmap-Image LSH (bImageLSH) for efficient processing of high-dimensional images. Using a real dataset we experimentally show the performance benefit of our novel design while keeping the accuracy of the image results high. |
|
2019 | Learning Hash Function Through Codewords | Huang Yinjie, Georgiopoulos Michael, Anagnostopoulos Georgios C. | Arxiv | In this paper we propose a novel hash learning approach that has the following main distinguishing features when compared to past frameworks. First the codewords are utilized in the Hamming space as ancillary techniques to accomplish its hash learning task. These codewords which are inferred from the data attempt to capture grouping aspects of the datas hash codes. Furthermore the proposed framework is capable of addressing supervised unsupervised and even semi-supervised hash learning scenarios. Additionally the framework adopts a regularization term over the codewords which automatically chooses the codewords for the problem. To efficiently solve the problem one Block Coordinate Descent algorithm is showcased in the paper. We also show that one step of the algorithms can be casted into several Support Vector Machine problems which enables our algorithms to utilize efficient software package. For the regularization term a closed form solution of the proximal operator is provided in the paper. A series of comparative experiments focused on content-based image retrieval highlights its performance advantages. |
2019 | Understanding Sparse JL For Feature Hashing | Meena Jagadeesan | Neural Information Processing Systems | Feature hashing and other random projection schemes are commonly used to reduce the dimensionality of feature vectors. The goal is to efficiently project a high-dimensional feature vector living in R^n into a much lower-dimensional space R^m while approximately preserving Euclidean norm. These schemes can be constructed using sparse random projections for example using a sparse Johnson-Lindenstrauss (JL) transform. A line of work introduced by Weinberger et. al (ICML 09) analyzes the accuracy of sparse JL with sparsity 1 on feature vectors with small linfinity-to-l2 norm ratio. Recently Freksen Kamma and Larsen (NeurIPS 18) closed this line of work by proving a tight tradeoff between linfinity-to-l2 norm ratio and accuracy for sparse JL with sparsity 1. In this paper we demonstrate the benefits of using sparsity s greater than 1 in sparse JL on feature vectors. Our main result is a tight tradeoff between linfinity-to-l2 norm ratio and accuracy for a general sparsity s that significantly generalizes the result of Freksen et. al. Our result theoretically demonstrates that sparse JL with s 1 can have significantly better norm-preservation properties on feature vectors than sparse JL with s = 1; we also empirically demonstrate this finding. |
2019 | Separated Variational Hashing Networks for Cross-Modal Retrieval | Peng Hu, Xu Wang, Liangli Zhen, Dezhong Peng | MM | Cross-modal hashing, due to its low storage cost and high query speed, has been successfully used for similarity search in multimedia retrieval applications. It projects high-dimensional data into a shared isomorphic Hamming space with similar binary codes for semantically-similar data. In some applications, all modalities may not be obtained or trained simultaneously for some reasons, such as privacy, secret, storage limitation, and computational resource limitation. However, most existing cross-modal hashing methods need all modalities to jointly learn the common Hamming space, thus hindering them from handling these problems. In this paper, we propose a novel approach called Separated Variational Hashing Networks (SVHNs) to overcome the above challenge. Firstly, it adopts a label network (LabNet) to exploit available and nonspecific label annotations to learn a latent common Hamming space by projecting each semantic label into a common binary representation. Then, each modality-specific network can separately map the samples of the corresponding modality into their binary semantic codes learned by LabNet. We achieve it by conducting variational inference to match the aggregated posterior of the hashing code of LabNet with an arbitrary prior distribution. The effectiveness and efficiency of our SVHNs are verified by extensive experiments carried out on four widely-used multimedia databases, in comparison with 11 state-of-the-art approaches. |
2019 | Accelerate Learning of Deep Hashing With Gradient Attention | Long-Kai Huang, Jianda Chen, Sinno Jialin Pan | ICCV | Recent years have witnessed the success of learning to hash in fast large-scale image retrieval. As deep learning has shown its superior performance on many computer vision applications, recent designs of learning-based hashing models have been moving from shallow ones to deep architectures. However, based on our analysis, we find that gradient descent based algorithms used in deep hashing models would potentially cause hash codes of a pair of training instances to be updated towards the directions of each other simultaneously during optimization. In the worst case, the paired hash codes switch their directions after update, and consequently, their corresponding distance in the Hamming space remain unchanged. This makes the overall learning process highly inefficient. To address this issue, we propose a new deep hashing model integrated with a novel gradient attention mechanism. Extensive experimental results on three benchmark datasets show that our proposed algorithm is able to accelerate the learning process and obtain competitive retrieval performance compared with state-of-the-art deep hashing models. |
2019 | Asymmetric Deep Semantic Quantization For Image Retrieval | Yang Zhan, Raymond Osolo Ian, Sun Wuqing, Long Jun | Arxiv | Due to its fast retrieval and storage efficiency capabilities hashing has been widely used in nearest neighbor retrieval tasks. By using deep learning based techniques hashing can outperform non-learning based hashing technique in many applications. However we argue that the current deep learning based hashing methods ignore some critical problems (e.g. the learned hash codes are not discriminative due to the hashing methods being unable to discover rich semantic information and the training strategy having difficulty optimizing the discrete binary codes). In this paper we propose a novel image hashing method termed as (textbf)(underlineA)symmetric (textbf)(underlineD)eep (textbf)(underlineS)emantic (textbf)(underlineQ)uantization ((textbfADSQ)). (textbfADSQ) is implemented using three stream frameworks which consist of one (emphLabelNet) and two (emphImgNets). The (emphLabelNet) leverages the power of three fully-connected layers which are used to capture rich semantic information between image pairs. For the two (emphImgNets) they each adopt the same convolutional neural network structure but with different weights (i.e. asymmetric convolutional neural networks). The two (emphImgNets) are used to generate discriminative compact hash codes. Specifically the function of the (emphLabelNet) is to capture rich semantic information that is used to guide the two (emphImgNets) in minimizing the gap between the real-continuous features and the discrete binary codes. Furthermore (textbfADSQ) can utilize the most critical semantic information to guide the feature learning process and consider the consistency of the common semantic space and Hamming space. Experimental results on three benchmarks (i.e. CIFAR-10 NUS-WIDE and ImageNet) demonstrate that the proposed (textbfADSQ) can outperforms current state-of-the-art methods. |
2019 | Supervised Hierarchical Cross-Modal Hashing | Changchang Sun, Xuemeng Song, Fuli Feng, Wayne Xin Zhao, Hao Zhang and Liqiang Nie | SIGIR | Recently, due to the unprecedented growth of multimedia data, cross-modal hashing has gained increasing attention for the efficient cross-media retrieval. Typically, existing methods on crossmodal hashing treat labels of one instance independently but overlook the correlations among labels. Indeed, in many real-world scenarios, like the online fashion domain, instances (items) are labeled with a set of categories correlated by certain hierarchy. In this paper, we propose a new end-to-end solution for supervised cross-modal hashing, named HiCHNet, which explicitly exploits the hierarchical labels of instances. In particular, by the pre-established label hierarchy, we comprehensively characterize each modality of the instance with a set of layer-wise hash representations. In essence, hash codes are encouraged to not only preserve the layerwise semantic similarities encoded by the label hierarchy, but also retain the hierarchical discriminative capabilities. Due to the lack of benchmark datasets, apart from adapting the existing dataset FashionVC from fashion domain, we create a dataset from the online fashion platform Ssense consisting of 15, 696 image-text pairs labeled by 32 hierarchical categories. Extensive experiments on two real-world datasets demonstrate the superiority of our model over the state-of-the-art methods. |
2019 | DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node | Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, Rohan Kadekodi | NeurIPS | Current state-of-the-art approximate nearest neighbor search (ANNS) algorithms generate indices that must be stored in main memory for fast high-recall search. This makes them expensive and limits the size of the dataset. We present a new graph-based indexing and search system called DiskANN that can index, store, and search a billion point database on a single workstation with just 64GB RAM and an inexpensive solid-state drive (SSD). Contrary to current wisdom, we demonstrate that the SSD-based indices built by DiskANN can meet all three desiderata for large-scale ANNS: high-recall, low query latency and high density (points indexed per node). On the billion point SIFT1B bigann dataset, DiskANN serves > 5000 queries a second with < 3ms mean latency and 95%+ 1-recall@1 on a 16 core machine, where state-of-the-art billion-point ANNS algorithms with similar memory footprint like FAISS and IVFOADC+G+P plateau at around 50% 1-recall@1. Alternately, in the high recall regime, DiskANN can index and serve 5 − 10x more points per node compared to state-of-the-art graph- based methods such as HNSW and NSG. Finally, as part of our overall DiskANN system, we introduce Vamana, a new graph-based ANNS index that is more versatile than the graph indices even for in-memory indices. |
2019 | Unsupervised Rank-preserving Hashing For Large-scale Image Retrieval | Karaman Svebor, Lin Xudong, Hu Xuefeng, Chang Shih-fu | Arxiv | We propose an unsupervised hashing method which aims to produce binary codes that preserve the ranking induced by a real-valued representation. Such compact hash codes enable the complete elimination of real-valued feature storage and allow for significant reduction of the computation complexity and storage cost of large-scale image retrieval applications. Specifically we learn a neural network-based model which transforms the input representation into a binary representation. We formalize the training objective of the network in an intuitive and effective way considering each training sample as a query and aiming to obtain the same retrieval results using the produced hash codes as those obtained with the original features. This training formulation directly optimizes the hashing model for the target usage of the hash codes it produces. We further explore the addition of a decoder trained to obtain an approximated reconstruction of the original features. At test time we retrieved the most promising database samples with an efficient graph-based search procedure using only our hash codes and perform re-ranking using the reconstructed features thus without needing to access the original features at all. Experiments conducted on multiple publicly available large-scale datasets show that our method consistently outperforms all compared state-of-the-art unsupervised hashing methods and that the reconstruction procedure can effectively boost the search accuracy with a minimal constant additional cost. |
2019 | K-Nearest Neighbors Hashing | Xiangyu He, Peisong Wang, Jian Cheng | CVPR | Hashing based approximate nearest neighbor search embeds high dimensional data to compact binary codes, which enables efficient similarity search and storage. However, the non-isometry sign(·) function makes it hard to project the nearest neighbors in continuous data space into the closest codewords in discrete Hamming space. In this work, we revisit the sign(·) function from the perspective of space partitioning. In specific, we bridge the gap between k-nearest neighbors and binary hashing codes with Shannon entropy. We further propose a novel K-Nearest Neighbors Hashing (KNNH) method to learn binary representations from KNN within the subspaces generated by sign(·). Theoretical and experimental results show that the KNN relation is of central importance to neighbor preserving embeddings, and the proposed method outperforms the state-of-the-arts on benchmark datasets. |
2019 | Joint Cluster Unary Loss For Efficient Cross-modal Hashing | Zhang Shifeng, Li Jianmin, Zhang Bo | Arxiv | With the rapid growth of various types of multimodal data cross-modal deep hashing has received broad attention for solving cross-modal retrieval problems efficiently. Most cross-modal hashing methods follow the traditional supervised hashing framework in which the O(n^2) data pairs and O(n^3) data triplets are generated for training but the training procedure is less efficient because the complexity is high for large-scale dataset. To address these issues we propose a novel and efficient cross-modal hashing algorithm in which the unary loss is introduced. First of all We introduce the Cross-Modal Unary Loss (CMUL) with O(n) complexity to bridge the traditional triplet loss and classification-based unary loss. A more accurate bound of the triplet loss for structured multilabel data is also proposed in CMUL. Second we propose the novel Joint Cluster Cross-Modal Hashing (JCCH) algorithm for efficient hash learning in which the CMUL is involved. The resultant hashcodes form several clusters in which the hashcodes in the same cluster share similar semantic information and the heterogeneity gap on different modalities is diminished by sharing the clusters. The proposed algorithm is able to be applied to various types of data and experiments on large-scale datasets show that the proposed method is superior over or comparable with state-of-the-art cross-modal hashing methods and training with the proposed method is more efficient than others. |
2019 | Adaptive Labeling for Deep Learning to Hash | Huei-Fang Yang, Cheng-Hao Tu, Chu-Song Chen | CVPRW | Hash function learning has been widely used for largescale image retrieval because of the efficiency of computation and storage. We introduce AdaLabelHash, a binary hash function learning approach via deep neural networks in this paper. In AdaLabelHash, class label representations are variables that are adapted during the backward network training procedure. We express the labels as hypercube vertices in a K-dimensional space, and the class label representations together with the network weights are updated in the learning process. As the label representations (or referred to as codewords in this work), are learned from data, semantically similar classes will be assigned with the codewords that are close to each other in terms of Hamming distance in the label space. The codewords then serve as the desired output of the hash function learning, and yield compact and discriminating binary hash representations. AdaLabelHash is easy to implement, which can jointly learn label representations and infer compact binary codes from data. It is applicable to both supervised and semi-supervised hash. Experimental results on standard benchmarks demonstrate the satisfactory performance of AdaLabelHash. |
2019 | Unsupervised Neural Generative Semantic Hashing | Hansen Casper, Hansen Christian, Simonsen Jakob Grue, Alstrup Stephen, Lioma Christina | Arxiv | Fast similarity search is a key component in large-scale information retrieval where semantic hashing has become a popular strategy for representing documents as binary hash codes. Recent advances in this area have been obtained through neural network based models generative models trained by learning to reconstruct the original documents. We present a novel unsupervised generative semantic hashing approach (textit)Ranking based Semantic Hashing (RBSH) that consists of both a variational and a ranking based component. Similarly to variational autoencoders the variational component is trained to reconstruct the original document conditioned on its generated hash code and as in prior work it only considers documents individually. The ranking component solves this limitation by incorporating inter-document similarity into the hash code generation modelling document ranking through a hinge loss. To circumvent the need for labelled data to compute the hinge loss we use a weak labeller and thus keep the approach fully unsupervised. Extensive experimental evaluation on four publicly available datasets against traditional baselines and recent state-of-the-art methods for semantic hashing shows that RBSH significantly outperforms all other methods across all evaluated hash code lengths. In fact RBSH hash codes are able to perform similarly to state-of-the-art hash codes while using 2-4x fewer bits. |
2019 | Hamming Sentence Embeddings For Information Retrieval | Hamann Felix, Kurz Nadja, Ulges Adrian | Arxiv | In retrieval applications binary hashes are known to offer significant improvements in terms of both memory and speed. We investigate the compression of sentence embeddings using a neural encoder-decoder architecture which is trained by minimizing reconstruction error. Instead of employing the original real-valued embeddings we use latent representations in Hamming space produced by the encoder for similarity calculations. In quantitative experiments on several benchmarks for semantic similarity tasks we show that our compressed hamming embeddings yield a comparable performance to uncompressed embeddings (Sent2Vec InferSent Glove-BoW) at compression ratios of up to 2561. We further demonstrate that our model strongly decorrelates input features and that the compressor generalizes well when pre-trained on Wikipedia sentences. We publish the source code on Github and all experimental results. |
2019 | Supervised Discrete Hashing With Relaxation | Gui Jie, Liu Tongliang, Sun Zhenan, Tao Dacheng, Tan Tieniu | Arxiv | Data-dependent hashing has recently attracted attention due to being able to support efficient retrieval and storage of high-dimensional data such as documents images and videos. In this paper we propose a novel learning-based hashing method called Supervised Discrete Hashing with Relaxation (SDHR) based on Supervised Discrete Hashing (SDH). SDH uses ordinary least squares regression and traditional zero-one matrix encoding of class label information as the regression target (code words) thus fixing the regression target. In SDHR the regression target is instead optimized. The optimized regression target matrix satisfies a large margin constraint for correct classification of each example. Compared with SDH which uses the traditional zero-one matrix SDHR utilizes the learned regression target matrix and therefore more accurately measures the classification error of the regression model and is more flexible. As expected SDHR generally outperforms SDH. Experimental results on two large-scale image datasets (CIFAR-10 and MNIST) and a large-scale and challenging face dataset (FRGC) demonstrate the effectiveness and efficiency of SDHR. |
2019 | Deep Hashing For Signed Social Network Embedding | Guo Jia-nan, Mao Xian-ling, Jiang Xiao-jian, Sun Ying-xiang, Wei Wei, Huang He-yan | Arxiv | Network embedding is a promising way of network representation facilitating many signed social network processing and analysis tasks such as link prediction and node classification. Recently feature hashing has been adopted in several existing embedding algorithms to improve the efficiency which has obtained a great success. However the existing feature hashing based embedding algorithms only consider the positive links in signed social networks. Intuitively negative links can also help improve the performance. Thus in this paper we propose a novel deep hashing method for signed social network embedding by considering simultaneously positive and negative links. Extensive experiments show that the proposed method performs better than several state-of-the-art baselines through link prediction task over two real-world signed social networks. |
2019 | Fast Supervised Discrete Hashing | Gui Jie, Liu Tongliang, Sun Zhenan, Tao Dacheng, Tan Tieniu | Arxiv | Learning-based hashing algorithms are hot topics because they can greatly increase the scale at which existing methods operate. In this paper we propose a new learning-based hashing method called fast supervised discrete hashing (FSDH) based on supervised discrete hashing (SDH). Regressing the training examples (or hash code) to the corresponding class labels is widely used in ordinary least squares regression. Rather than adopting this method FSDH uses a very simple yet effective regression of the class labels of training examples to the corresponding hash code to accelerate the algorithm. To the best of our knowledge this strategy has not previously been used for hashing. Traditional SDH decomposes the optimization into three sub-problems with the most critical sub-problem - discrete optimization for binary hash codes - solved using iterative discrete cyclic coordinate descent (DCC) which is time-consuming. However FSDH has a closed-form solution and only requires a single rather than iterative hash code-solving step which is highly efficient. Furthermore FSDH is usually faster than SDH for solving the projection matrix for least squares regression making FSDH generally faster than SDH. For example our results show that FSDH is about 12-times faster than SDH when the number of hashing bits is 128 on the CIFAR-10 data base and FSDH is about 151-times faster than FastHash when the number of hashing bits is 64 on the MNIST data-base. Our experimental results show that FSDH is not only fast but also outperforms other comparative methods. |
2019 | Hashgraph -- Scalable Hash Tables Using A Sparse Graph Data Structure | Green Oded | Arxiv | Hash tables are ubiquitous and used in a wide range of applications for efficient probing of large and unsorted data. If designed properly hash-tables can enable efficients look ups in a constant number of operations or commonly referred to as O(1) operations. As data sizes continue to grow and data becomes less structured (as is common for big-data applications) the need for efficient and scalable hash table also grows. In this paper we introduce HashGraph a new scalable approach for building hash tables that uses concepts taken from sparse graph representations–hence the name HashGraph. We show two different variants of HashGraph a simple algorithm that outlines the method to create the hash-table and an advanced method that creates the hash table in a more efficient manner (with an improved memory access pattern). HashGraph shows a new way to deal with hash-collisions that does not use open-addressing or chaining yet has all the benefits of both these approaches. HashGraph currently works for static inputs though recent progress with dynamic graph data structures suggest that HashGraph might be extended to dynamic inputs as well. We show that HashGraph can deal with a large number of hash-values per entry without loss of performance as most open-addressing and chaining approaches have. Further we show that HashGraph is indifferent to the load-factor. Lastly we show a new probing algorithm for the second phase of value lookups. Given the above HashGraph is extremely fast and outperforms several state of the art hash-table implementations. The implementation of HashGraph in this paper is for NVIDIA GPUs though HashGraph is not architecture dependent. Using a NVIDIA GV100 GPU HashGraph is anywhere from 2X-8X faster than cuDPP WarpDrive and cuDF. HashGraph is able to build a hash-table at a rate of 2.5 billion keys per second and can probe at nearly the same rate. |
2019 | Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval | Shupeng Su, Zhisheng Zhong, Chao Zhang | ICCV |
Cross-modal hashing encodes the multimedia data into a common binary hash space in which the correlations among the samples from different modalities can be effectively measured. Deep cross-modal hashing further improves the retrieval performance as the deep neural networks can generate more semantic relevant features and hash codes. In this paper, we study the unsupervised deep cross-modal hash coding and propose Deep Joint Semantics Reconstructing Hashing (DJSRH), which has the following two main advantages. First, to learn binary codes that preserve the neighborhood structure of the original data, DJSRH constructs a novel joint-semantics affinity matrix which elaborately integrates the original neighborhood information from different modalities and accordingly is capable to capture the latent intrinsic semantic affinity for the input multi-modal instances. Second, DJSRH later trains the networks to generate binary codes that maximally reconstruct above joint-semantics relations via the proposed reconstructing framework, which is more competent for the batch-wise training as it reconstructs the specific similarity value unlike the common Laplacian constraint merely preserving the similarity order. Extensive experiments demonstrate the significant improvement by DJSRH in various cross-modal retrieval tasks. |
2019 | Challenging Deep Image Descriptors For Retrieval In Heterogeneous Iconographic Collections | Gominski Dimitri Lastig, Poreba Martyna Lastig, Gouet-brunet Valérie Lastig, Chen Liming Lastig | Arxiv | This article proposes to study the behavior of recent and efficient state-of-the-art deep-learning based image descriptors for content-based image retrieval facing a panel of complex variations appearing in heterogeneous image datasets in particular in cultural collections that may involve multi-source multi-date and multi-view Permission to make digital |
2019 | Weakly Supervised Deep Image Hashing through Tag Embeddings | Vijetha Gattupalli, Yaoxin Zhuo, Baoxin Li | CVPR | Many approaches to semantic image hashing have been formulated as supervised learning problems that utilize images and label information to learn the binary hash codes. However, large-scale labeled image data is expensive to obtain, thus imposing a restriction on the usage of such algorithms. On the other hand, unlabelled image data is abundant due to the existence of many Web image repositories. Such Web images may often come with images tags that contain useful information, although raw tags, in general, do not readily lead to semantic labels. Motivated by this scenario, we formulate the problem of semantic image hashing as a weakly-supervised learning problem. We utilize the information contained in the user-generated tags associated with the images to learn the hash codes. More specifically, we extract the word2vec semantic embeddings of the tags and use the information contained in them for constraining the learning. Accordingly, we name our model Weakly Supervised Deep Hashing using Tag Embeddings (WDHT). WDHT is tested for the task of semantic image retrieval and is compared against several state-of-art models. Results show that our approach sets a new state-of-art in the area of weekly supervised image hashing. |
2019 | Feature Pyramid Hashing | Yang Yifan, Geng Libing, Lai Hanjiang, Pan Yan, Yin Jian | Arxiv | In recent years deep-networks-based hashing has become a leading approach for large-scale image retrieval. Most deep hashing approaches use the high layer to extract the powerful semantic representations. However these methods have limited ability for fine-grained image retrieval because the semantic features extracted from the high layer are difficult in capturing the subtle differences. To this end we propose a novel two-pyramid hashing architecture to learn both the semantic information and the subtle appearance details for fine-grained image search. Inspired by the feature pyramids of convolutional neural network a vertical pyramid is proposed to capture the high-layer features and a horizontal pyramid combines multiple low-layer features with structural information to capture the subtle differences. To fuse the low-level features a novel combination strategy called consensus fusion is proposed to capture all subtle information from several low-layers for finer retrieval. Extensive evaluation on two fine-grained datasets CUB-200-2011 and Stanford Dogs demonstrate that the proposed method achieves significant performance compared with the state-of-art baselines. |
2019 | Nearly-unsupervised Hashcode Representations For Relation Extraction | Garg Sahil, Galstyan Aram, Steeg Greg Ver, Cecchi Guillermo | Arxiv | Recently kernelized locality sensitive hashcodes have been successfully employed as representations of natural language text especially showing high relevance to biomedical relation extraction tasks. In this paper we propose to optimize the hashcode representations in a nearly unsupervised manner in which we only use data points but not their class labels for learning. The optimized hashcode representations are then fed to a supervised classifier following the prior work. This nearly unsupervised approach allows fine-grained optimization of each hash function which is particularly suitable for building hashcode representations generalizing from a training set to a test set. We empirically evaluate the proposed approach for biomedical relation extraction tasks obtaining significant accuracy improvements w.r.t. state-of-the-art supervised and semi-supervised approaches. |
2019 | Bag Of Negatives For Siamese Architectures | Gajic Bojana, Amato Ariel, Baldrich Ramon, Gatta Carlo | Arxiv | Training a Siamese architecture for re-identification with a large number of identities is a challenging task due to the difficulty of finding relevant negative samples efficiently. In this work we present Bag of Negatives (BoN) a method for accelerated and improved training of Siamese networks that scales well on datasets with a very large number of identities. BoN is an efficient and loss-independent method able to select a bag of high quality negatives based on a novel online hashing strategy. |
2019 | The Bitwise Hashing Trick For Personalized Search | Gaskill Braddock | Applied Artificial Intelligence Volume | Many real world problems require fast and efficient lexical comparison of large numbers of short text strings. Search personalization is one such domain. We introduce the use of feature bit vectors using the hashing trick for improving relevance in personalized search and other personalization applications. We present results of several lexical hashing and comparison methods. These methods are applied to a users historical behavior and are used to predict future behavior. Using a single bit per dimension instead of floating point results in an order of magnitude decrease in data structure size while preserving or even improving quality. We use real data to simulate a search personalization task. A simple method for combining bit vectors demonstrates an order of magnitude improvement in compute time on the task with only a small decrease in accuracy. |
2019 | Probminhash -- A Class Of Locality-sensitive Hash Algorithms For The (probability) Jaccard Similarity | Ertl Otmar | Arxiv | The probability Jaccard similarity was recently proposed as a natural generalization of the Jaccard similarity to measure the proximity of sets whose elements are associated with relative frequencies or probabilities. In combination with a hash algorithm that maps those weighted sets to compact signatures which allow fast estimation of pairwise similarities it constitutes a valuable method for big data applications such as near-duplicate detection nearest neighbor search or clustering. This paper introduces a class of one-pass locality-sensitive hash algorithms that are orders of magnitude faster than the original approach. The performance gain is achieved by calculating signature components not independently but collectively. Four different algorithms are proposed based on this idea. Two of them are statistically equivalent to the original approach and can be used as drop-in replacements. The other two may even improve the estimation error by introducing statistical dependence between signature components. Moreover the presented techniques can be specialized for the conventional Jaccard similarity resulting in highly efficient algorithms that outperform traditional minwise hashing and that are able to compete with the state of the art. |
2019 | Deep Spherical Quantization For Image Search | Eghbali Sepehr, Tahvildari Ladan | Arxiv | Hashing methods which encode high-dimensional images with compact discrete codes have been widely applied to enhance large-scale image retrieval. In this paper we put forward Deep Spherical Quantization (DSQ) a novel method to make deep convolutional neural networks generate supervised and compact binary codes for efficient image search. Our approach simultaneously learns a mapping that transforms the input images into a low-dimensional discriminative space and quantizes the transformed data points using multi-codebook quantization. To eliminate the negative effect of norm variance on codebook learning we force the network to L_2 normalize the extracted features and then quantize the resulting vectors using a new supervised quantization technique specifically designed for points lying on a unit hypersphere. Furthermore we introduce an easy-to-implement extension of our quantization technique that enforces sparsity on the codebooks. Extensive experiments demonstrate that DSQ and its sparse variant can generate semantically separable compact binary codes outperforming many state-of-the-art image retrieval methods on three benchmarks. |
2019 | Document Hashing With Mixture-prior Generative Models | Dong Wei, Su Qinliang, Shen Dinghan, Chen Changyou | Arxiv | Hashing is promising for large-scale information retrieval tasks thanks to the efficiency of distance evaluation between binary codes. Generative hashing is often used to generate hashing codes in an unsupervised way. However existing generative hashing methods only considered the use of simple priors like Gaussian and Bernoulli priors which limits these methods to further improve their performance. In this paper two mixture-prior generative models are proposed under the objective to produce high-quality hashing codes for documents. Specifically a Gaussian mixture prior is first imposed onto the variational auto-encoder (VAE) followed by a separate step to cast the continuous latent representation of VAE into binary code. To avoid the performance loss caused by the separate casting a model using a Bernoulli mixture prior is further developed in which an end-to-end training is admitted by resorting to the straight-through (ST) discrete gradient estimator. Experimental results on several benchmark datasets demonstrate that the proposed methods especially the one using Bernoulli mixture priors consistently outperform existing ones by a substantial margin. |
2019 | Efficient Discrete Supervised Hashing For Large-scale Cross-modal Retrieval | Yao Tao, Kong Xiangwei, Yan Lianshan, Tang Wenjing, Tian Qi | Arxiv | Supervised cross-modal hashing has gained increasing research interest on large-scale retrieval task owning to its satisfactory performance and efficiency. However it still has some challenging issues to be further studied 1) most of them fail to well preserve the semantic correlations in hash codes because of the large heterogenous gap; 2) most of them relax the discrete constraint on hash codes leading to large quantization error and consequent low performance; 3) most of them suffer from relatively high memory cost and computational complexity during training procedure which makes them unscalable. In this paper to address above issues we propose a supervised cross-modal hashing method based on matrix factorization dubbed Efficient Discrete Supervised Hashing (EDSH). Specifically collective matrix factorization on heterogenous features and semantic embedding with class labels are seamlessly integrated to learn hash codes. Therefore the feature based similarities and semantic correlations can be both preserved in hash codes which makes the learned hash codes more discriminative. Then an efficient discrete optimal algorithm is proposed to handle the scalable issue. Instead of learning hash codes bit-by-bit hash codes matrix can be obtained directly which is more efficient. Extensive experimental results on three public real-world datasets demonstrate that EDSH produces a superior performance in both accuracy and scalability over some existing cross-modal hashing methods. |
2019 | Simultaneous Feature Aggregating And Hashing For Compact Binary Code Learning | Do Thanh-toan, Le Khoa, Hoang Tuan, Le Huu, Nguyen Tam V., Cheung Ngai-man | Arxiv | Representing images by compact hash codes is an attractive approach for large-scale content-based image retrieval. In most state-of-the-art hashing-based image retrieval systems for each image local descriptors are first aggregated as a global representation vector. This global vector is then subjected to a hashing function to generate a binary hash code. In previous works the aggregating and the hashing processes are designed independently. Hence these frameworks may generate suboptimal hash codes. In this paper we first propose a novel unsupervised hashing framework in which feature aggregating and hashing are designed simultaneously and optimized jointly. Specifically our joint optimization generates aggregated representations that can be better reconstructed by some binary codes. This leads to more discriminative binary hash codes and improved retrieval accuracy. In addition the proposed method is flexible. It can be extended for supervised hashing. When the data label is available the framework can be adapted to learn binary codes which minimize the reconstruction loss w.r.t. label vectors. Furthermore we also propose a fast version of the state-of-the-art hashing method Binary Autoencoder to be used in our proposed frameworks. Extensive experiments on benchmark datasets under various settings show that the proposed methods outperform state-of-the-art unsupervised and supervised hashing methods. |
2019 | Bilinear Supervised Hashing Based On 2D Image Features | Ding Yujuan, Wong Wai Kueng, Lai Zhihui, Zhang Zheng | Arxiv | Hashing has been recognized as an efficient representation learning method to effectively handle big data due to its low computational complexity and memory cost. Most of the existing hashing methods focus on learning the low-dimensional vectorized binary features based on the high-dimensional raw vectorized features. However studies on how to obtain preferable binary codes from the original 2D image features for retrieval is very limited. This paper proposes a bilinear supervised discrete hashing (BSDH) method based on 2D image features which utilizes bilinear projections to binarize the image matrix features such that the intrinsic characteristics in the 2D image space are preserved in the learned binary codes. Meanwhile the bilinear projection approximation and vectorization binary codes regression are seamlessly integrated together to formulate the final robust learning framework. Furthermore a discrete optimization strategy is developed to alternatively update each variable for obtaining the high-quality binary codes. In addition two 2D image features traditional SURF-based FVLAD feature and CNN-based AlexConv5 feature are designed for further improving the performance of the proposed BSDH method. Results of extensive experiments conducted on four benchmark datasets show that the proposed BSDH method almost outperforms all competing hashing methods with different input features by different evaluation protocols. |
2019 | Triplet-based Deep Hashing Network For Cross-modal Retrieval | Deng Cheng, Chen Zhaojia, Liu Xianglong, Gao Xinbo, Tao Dacheng | Arxiv | Given the benefits of its low storage requirements and high retrieval efficiency hashing has recently received increasing attention. In particularcross-modal hashing has been widely and successfully used in multimedia similarity search applications. However almost all existing methods employing cross-modal hashing cannot obtain powerful hash codes due to their ignoring the relative similarity between heterogeneous data that contains richer semantic information leading to unsatisfactory retrieval performance. In this paper we propose a triplet-based deep hashing (TDH) network for cross-modal retrieval. First we utilize the triplet labels which describes the relative relationships among three instances as supervision in order to capture more general semantic correlations between cross-modal instances. We then establish a loss function from the inter-modal view and the intra-modal view to boost the discriminative abilities of the hash codes. Finally graph regularization is introduced into our proposed TDH method to preserve the original semantic similarity between hash codes in Hamming space. Experimental results show that our proposed method outperforms several state-of-the-art approaches on two popular cross-modal datasets. |
2019 | Adversarially Trained Deep Neural Semantic Hashing Scheme For Subjective Search In Fashion Inventory | Singh Saket, Sheet Debdoot, Dasgupta Mithun | Arxiv | The simple approach of retrieving a closest match of a query image from one in the gallery compares an image pair using sum of absolute difference in pixel or feature space. The process is computationally expensive ill-posed to illumination background composition pose variation as well as inefficient to be deployed on gallery sets with more than 1000 elements. Hashing is a faster alternative which involves representing images in reduced dimensional simple feature spaces. Encoding images into binary hash codes enables similarity comparison in an image-pair using the Hamming distance measure. The challenge however lies in encoding the images using a semantic hashing scheme that lets subjective neighbors lie within the tolerable Hamming radius. This work presents a solution employing adversarial learning of a deep neural semantic hashing network for fashion inventory retrieval. It consists of a feature extracting convolutional neural network (CNN) learned to (i) minimize error in classifying type of clothing (ii) minimize hamming distance between semantic neighbors and maximize distance between semantically dissimilar images (iii) maximally scramble a discriminators ability to identify the corresponding hash code-image pair when processing a semantically similar query-gallery image pair. Experimental validation for fashion inventory search yields a mean average precision (mAP) of 90.6537; in finding the closest match as compared to 53.2637; obtained by the prior art of deep Cauchy hashing for hamming space retrieval. |
2019 | Algorithms For Similarity Search And Pseudorandomness | Christiani Tobias | Arxiv | We study the problem of approximate near neighbor (ANN) search and show the following results - An improved framework for solving the ANN problem using locality-sensitive hashing reducing the number of evaluations of locality-sensitive hash functions and the word-RAM complexity compared to the standard framework. - A framework for solving the ANN problem with space-time tradeoffs as well as tight upper and lower bounds for the space-time tradeoff of framework solutions to the ANN problem under cosine similarity. - A novel approach to solving the ANN problem on sets along with a matching lower bound improving the state of the art. - A self-tuning version of the algorithm is shown through experiments to outperform existing similarity join algorithms. - Tight lower bounds for asymmetric locality-sensitive hashing which has applications to the approximate furthest neighbor problem orthogonal vector search and annulus queries. - A proof of the optimality of a well-known Boolean locality-sensitive hashing scheme. We study the problem of efficient algorithms for producing high-quality pseudorandom numbers and obtain the following results - A deterministic algorithm for generating pseudorandom numbers of arbitrarily high quality in constant time using near-optimal space. - A randomized construction of a family of hash functions that outputs pseudorandom numbers of arbitrarily high quality with space usage and running time nearly matching known cell-probe lower bounds. |
2019 | Central Similarity Quantization For Efficient Image And Video Retrieval | Yuan Li, Wang Tao, Zhang Xiaopeng, Tay Francis Eh, Jie Zequn, Liu Wei, Feng Jiashi | Arxiv | Existing data-dependent hashing methods usually learn hash functions from pairwise or triplet data relationships which only capture the data similarity locally and often suffer from low learning efficiency and low collision rate. In this work we propose a new (emphglobal) similarity metric termed as (emph)central similarity with which the hash codes of similar data pairs are encouraged to approach a common center and those for dissimilar pairs to converge to different centers to improve hash learning efficiency and retrieval accuracy. We principally formulate the computation of the proposed central similarity metric by introducing a new concept i.e. (emph)hash center that refers to a set of data points scattered in the Hamming space with a sufficient mutual distance between each other. We then provide an efficient method to construct well separated hash centers by leveraging the Hadamard matrix and Bernoulli distributions. Finally we propose the Central Similarity Quantization (CSQ) that optimizes the central similarity between data points w.r.t. their hash centers instead of optimizing the local similarity. CSQ is generic and applicable to both image and video hashing scenarios. Extensive experiments on large-scale image and video retrieval tasks demonstrate that CSQ can generate cohesive hash codes for similar data pairs and dispersed hash codes for dissimilar pairs achieving a noticeable boost in retrieval performance i.e. 337;-2037; in mAP over the previous state-of-the-arts. The code is at (url)https://github.com/yuanli2333/Hadamard-Matrix-for-hashing}” |
2019 | Analysis Of Sparsehash An Efficient Embedding Of Set-similarity Via Sparse Projections | Valsesia Diego, Fosson Sophie Marie, Ravazzi Chiara, Bianchi Tiziano, Magli Enrico | Arxiv | Embeddings provide compact representations of signals in order to perform efficient inference in a wide variety of tasks. In particular random projections are common tools to construct Euclidean distance-preserving embeddings while hashing techniques are extensively used to embed set-similarity metrics such as the Jaccard coefficient. In this letter we theoretically prove that a class of random projections based on sparse matrices called SparseHash can preserve the Jaccard coefficient between the supports of sparse signals which can be used to estimate set similarities. Moreover besides the analysis we provide an efficient implementation and we test the performance in several numerical experiments both on synthetic and real datasets. |
2019 | Using Deep Cross Modal Hashing And Error Correcting Codes For Improving The Efficiency Of Attribute Guided Facial Image Retrieval | Talreja Veeru, Taherkhani Fariborz, Valenti Matthew C., Nasrabadi Nasser M. | Arxiv | With benefits of fast query speed and low storage cost hashing-based image retrieval approaches have garnered considerable attention from the research community. In this paper we propose a novel Error-Corrected Deep Cross Modal Hashing (CMH-ECC) method which uses a bitmap specifying the presence of certain facial attributes as an input query to retrieve relevant face images from the database. In this architecture we generate compact hash codes using an end-to-end deep learning module which effectively captures the inherent relationships between the face and attribute modality. We also integrate our deep learning module with forward error correction codes to further reduce the distance between different modalities of the same subject. Specifically the properties of deep hashing and forward error correction codes are exploited to design a cross modal hashing framework with high retrieval performance. Experimental results using two standard datasets with facial attributes-image modalities indicate that our CMH-ECC face image retrieval model outperforms most of the current attribute-based face image retrieval approaches. |
2019 | Locality-sensitive Hashing For F-divergences Mutual Information Loss And Beyond | Lin Chen, Hossein Esfandiari, Gang Fu, Vahab Mirrokni | Neural Information Processing Systems | Computing approximate nearest neighbors in high dimensional spaces is a central problem in large-scale data mining with a wide range of applications in machine learning and data science. A popular and effective technique in computing nearest neighbors approximately is the locality-sensitive hashing (LSH) scheme. In this paper we aim to develop LSH schemes for distance functions that measure the distance between two probability distributions particularly for f-divergences as well as a generalization to capture mutual information loss. First we provide a general framework to design LHS schemes for f-divergence distance functions and develop LSH schemes for the generalized Jensen-Shannon divergence and triangular discrimination in this framework. We show a two-sided approximation result for approximation of the generalized Jensen-Shannon divergence by the Hellinger distance which may be of independent interest. Next we show a general method of reducing the problem of designing an LSH scheme for a Krein kernel (which can be expressed as the difference of two positive definite kernels) to the problem of maximum inner product search. We exemplify this method by applying it to the mutual information loss due to its several important applications such as model compression. |
2019 | Deep Supervised Hashing With Anchor Graph | Yudong Chen, Zhihui Lai, Yujuan Ding, Kaiyi Lin, Wai Keung Wong | ICCV | Recently, a series of deep supervised hashing methods were proposed for binary code learning. However, due to the high computation cost and the limited hardware’s memory, these methods will first select a subset from the training set, and then form a mini-batch data to update the network in each iteration. Therefore, the remaining labeled data cannot be fully utilized and the model cannot directly obtain the binary codes of the entire training set for retrieval. To address these problems, this paper proposes an interesting regularized deep model to seamlessly integrate the advantages of deep hashing and efficient binary code learning by using the anchor graph. As such, the deep features and label matrix can be jointly used to optimize the binary codes, and the network can obtain more discriminative feedback from the linear combinations of the learned bits. Moreover, we also reveal the algorithm mechanism and its computation essence. Experiments on three large-scale datasets indicate that the proposed method achieves better retrieval performance with less training time compared to previous deep hashing methods. |
2019 | Hadamard Codebook Based Deep Hashing | Chen Shen, Cao Liujuan, Lin Mingbao, Wang Yan, Sun Xiaoshuai, Wu Chenglin, Qiu Jingfei, Ji Rongrong | Arxiv | As an approximate nearest neighbor search technique hashing has been widely applied in large-scale image retrieval due to its excellent efficiency. Most supervised deep hashing methods have similar loss designs with embedding learning while quantizing the continuous high-dim feature into compact binary space. We argue that the existing deep hashing schemes are defective in two issues that seriously affect the performance i.e. bit independence and bit balance. The former refers to hash codes of different classes should be independent of each other while the latter means each bit should have a balanced distribution of +1s and -1s. In this paper we propose a novel supervised deep hashing method termed Hadamard Codebook based Deep Hashing (HCDH) which solves the above two problems in a unified formulation. Specifically we utilize an off-the-shelf algorithm to generate a binary Hadamard codebook to satisfy the requirement of bit independence and bit balance which subsequently serves as the desired outputs of the hash functions learning. We also introduce a projection matrix to solve the inconsistency between the order of Hadamard matrix and the number of classes. Besides the proposed HCDH further exploits the supervised labels by constructing a classifier on top of the outputs of hash functions. Extensive experiments demonstrate that HCDH can yield discriminative and balanced binary codes which well outperforms many state-of-the-arts on three widely-used benchmarks. |
2019 | A Two-step Cross-modal Hashing by Exploiting Label Correlations and Preserving Similarity in Both Steps | Zhen-Duo Chen, Yongxin Wang, Hui-Qiong Li, Xin Luo, Liqiang Nie, Xin-Shun | MM | In this paper, we present a novel Two-stEp Cross-modal Hashing method, TECH for short, for cross-modal retrieval tasks. As a two-step method, it first learns hash codes based on semantic labels, while preserving the similarity in the original space and exploiting the label correlations in the label space. In the light of this, it is able to make better use of label information and generate better binary codes. In addition, different from other two-step methods that mainly focus on the hash codes learning, TECH adopts a new hash function learning strategy in the second step, which also preserves the similarity in the original space. Moreover, with the help of well designed objective function and optimization scheme, it is able to generate hash codes discretely and scalable for large scale data. To the best of our knowledge, it is the first cross-modal hashing method exploiting label correlations, and also the first two-step hashing model preserving the similarity while leaning hash function. Extensive experiments demonstrate that the proposed approach outperforms some state-of-the-art cross-modal hashing methods. |
2019 | Deep Semantic Text Hashing with Weak Supervision | Suthee Chaidaroon, Travis Ebesu, Yi Fang | SIGIR | With an ever increasing amount of data available on the web, fast similarity search has become the critical component for large-scale information retrieval systems. One solution is semantic hashing which designs binary codes to accelerate similarity search. Recently, deep learning has been successfully applied to the semantic hashing problem and produces high-quality compact binary codes compared to traditional methods. However, most state-of-the-art semantic hashing approaches require large amounts of hand-labeled training data which are often expensive and time consuming to collect. The cost of getting labeled data is the key bottleneck in deploying these hashing methods. Motivated by the recent success in machine learning that makes use of weak supervision, we employ unsupervised ranking methods such as BM25 to extract weak signals from training data. We further introduce two deep generative semantic hashing models to leverage weak signals for text hashing. The experimental results on four public datasets show that our models can generate high-quality binary codes without using hand-labeled training data and significantly outperform the competitive unsupervised semantic hashing baselines. |
2019 | Online Hashing With Efficient Updating Of Binary Codes | Weng Zhenyu, Zhu Yuesheng | Arxiv | Online hashing methods are efficient in learning the hash functions from the streaming data. However when the hash functions change the binary codes for the database have to be recomputed to guarantee the retrieval accuracy. Recomputing the binary codes by accumulating the whole database brings a timeliness challenge to the online retrieval process. In this paper we propose a novel online hashing framework to update the binary codes efficiently without accumulating the whole database. In our framework the hash functions are fixed and the projection functions are introduced to learn online from the streaming data. Therefore inefficient updating of the binary codes by accumulating the whole database can be transformed to efficient updating of the binary codes by projecting the binary codes into another binary space. The queries and the binary code database are projected asymmetrically to further improve the retrieval accuracy. The experiments on two multi-label image databases demonstrate the effectiveness and the efficiency of our method for multi-label image retrieval. |
2019 | Efficient Querying From Weighted Binary Codes | Weng Zhenyu, Zhu Yuesheng | Arxiv | Binary codes are widely used to represent the data due to their small storage and efficient computation. However there exists an ambiguity problem that lots of binary codes share the same Hamming distance to a query. To alleviate the ambiguity problem weighted binary codes assign different weights to each bit of binary codes and compare the binary codes by the weighted Hamming distance. Till now performing the querying from the weighted binary codes efficiently is still an open issue. In this paper we propose a new method to rank the weighted binary codes and return the nearest weighted binary codes of the query efficiently. In our method based on the multi-index hash tables two algorithms the table bucket finding algorithm and the table merging algorithm are proposed to select the nearest weighted binary codes of the query in a non-exhaustive and accurate way. The proposed algorithms are justified by proving their theoretic properties. The experiments on three large-scale datasets validate both the search efficiency and the search accuracy of our method. Especially for the number of weighted binary codes up to one billion our method shows a great improvement of more than 1000 times faster than the linear scan. |
2019 | Drill-down Interactive Retrieval Of Complex Scenes Using Natural Language Queries | Fuwen Tan, Paola Cascante-bonilla, Xiaoxiao Guo, Hui Wu, Song Feng, Vicente Ordonez | Neural Information Processing Systems | This paper explores the task of interactive image retrieval using natural language queries where a user progressively provides input queries to refine a set of retrieval results. Moreover our work explores this problem in the context of complex image scenes containing multiple objects. We propose Drill-down an effective framework for encoding multiple queries with an efficient compact state representation that significantly extends current methods for single-round image retrieval. We show that using multiple rounds of natural language queries as input can be surprisingly effective to find arbitrarily specific images of complex scenes. Furthermore we find that existing image datasets with textual captions can provide a surprisingly effective form of weak supervision for this task. We compare our method with existing sequential encoding and embedding networks demonstrating superior performance on two proposed benchmarks automatic image retrieval on a simulated scenario that uses region captions as queries and interactive image retrieval using real queries from human evaluators. |
2019 | Hashing with Mutual Information | F. Cakir, K. He, S. Bargal, S. Sclaroff | TPAMI | Binary vector embeddings enable fast nearest neighbor retrieval in large databases of high-dimensional objects, and play an important role in many practical applications, such as image and video retrieval. We study the problem of learning binary vector embeddings under a supervised setting, also known as hashing. We propose a novel supervised hashing method based on optimizing an information-theoretic quantity: mutual information. We show that optimizing mutual information can reduce ambiguity in the induced neighborhood structure in the learned Hamming space, which is essential in obtaining high retrieval performance. To this end, we optimize mutual information in deep neural networks with minibatch stochastic gradient descent, with a formulation that maximally and efficiently utilizes available supervision. Experiments on four image retrieval benchmarks, including ImageNet, confirm the effectiveness of our method in learning high-quality binary embeddings for nearest neighbor retrieval. |
2019 | Accurate And Fast Retrieval For Complex Non-metric Data Via Neighborhood Graphs | Boytsov Leonid, Nyberg Eric | Arxiv | We demonstrate that a graph-based search algorithm-relying on the construction of an approximate neighborhood graph-can directly work with challenging non-metric and/or non-symmetric distances without resorting to metric-space mapping and/or distance symmetrization which in turn lead to substantial performance degradation. Although the straightforward metrization and symmetrization is usually ineffective we find that constructing an index using a modified e.g. symmetrized distance can improve performance. This observation paves a way to a new line of research of designing index-specific graph-construction distance functions. |
2019 | Transductive Zero-shot Hashing For Multilabel Image Retrieval | Zou Qin, Zhang Zheng, Cao Ling, Chen Long, Wang Song | IEEE Transactions on Neural Networks and Learning Systems | Hash coding has been widely used in approximate nearest neighbor search for large-scale image retrieval. Given semantic annotations such as class labels and pairwise similarities of the training data hashing methods can learn and generate effective and compact binary codes. While some newly introduced images may contain undefined semantic labels which we call unseen images zeor-shot hashing techniques have been studied. However existing zeor-shot hashing methods focus on the retrieval of single-label images and cannot handle multi-label images. In this paper for the first time a novel transductive zero-shot hashing method is proposed for multi-label unseen image retrieval. In order to predict the labels of the unseen/target data a visual-semantic bridge is built via instance-concept coherence ranking on the seen/source data. Then pairwise similarity loss and focal quantization loss are constructed for training a hashing model using both the seen/source and unseen/target data. Extensive evaluations on three popular multi-label datasets demonstrate that the proposed hashing method achieves significantly better results than the competing methods. |
2019 | Cluster-wise Unsupervised Hashing For Cross-modal Similarity Search | Wang Lu, Yang Jie | Arxiv | Large-scale cross-modal hashing similarity retrieval has attracted more and more attention in modern search applications such as search engines and autopilot showing great superiority in computation and storage. However current unsupervised cross-modal hashing methods still have some limitations (1)many methods relax the discrete constraints to solve the optimization objective which may significantly degrade the retrieval performance;(2)most existing hashing model project heterogenous data into a common latent space which may always lose sight of diversity in heterogenous data;(3)transforming real-valued data point to binary codes always results in abundant loss of information producing the suboptimal continuous latent space. To overcome above problems in this paper a novel Cluster-wise Unsupervised Hashing (CUH) method is proposed. Specifically CUH jointly performs the multi-view clustering that projects the original data points from different modalities into its own low-dimensional latent semantic space and finds the cluster centroid points and the common clustering indicators in its own low-dimensional space and learns the compact hash codes and the corresponding linear hash functions. An discrete optimization framework is developed to learn the unified binary codes across modalities under the guidance cluster-wise code-prototypes. The reasonableness and effectiveness of CUH is well demonstrated by comprehensive experiments on diverse benchmark datasets. |
2019 | SHREWD Semantic Hierarchy-based Relational Embeddings For Weakly-supervised Deep Hashing | Arponen Heikki, Bishop Tom E | Arxiv | Using class labels to represent class similarity is a typical approach to training deep hashing systems for retrieval; samples from the same or different classes take binary 1 or 0 similarity values. This similarity does not model the full rich knowledge of semantic relations that may be present between data points. In this work we build upon the idea of using semantic hierarchies to form distance metrics between all available sample labels; for example cat to dog has a smaller distance than cat to guitar. We combine this type of semantic distance into a loss function to promote similar distances between the deep neural network embeddings. We also introduce an empirical Kullback-Leibler divergence loss term to promote binarization and uniformity of the embeddings. We test the resulting SHREWD method and demonstrate improvements in hierarchical retrieval scores using compact binary hash codes instead of real valued ones and show that in a weakly supervised hashing setting we are able to learn competitively without explicitly relying on class labels but instead on similarities between labels. |
2019 | Unsupervised Multi-modal Hashing For Cross-modal Retrieval | Yu Jun, Wu Xiao-jun | Arxiv | With the advantage of low storage cost and high efficiency hashing learning has received much attention in the domain of Big Data. In this paper we propose a novel unsupervised hashing learning method to cope with this open problem to directly preserve the manifold structure by hashing. To address this problem both the semantic correlation in textual space and the locally geometric structure in the visual space are explored simultaneously in our framework. Besides the 2;1-norm constraint is imposed on the projection matrices to learn the discriminative hash function for each modality. Extensive experiments are performed to evaluate the proposed method on the three publicly available datasets and the experimental results show that our method can achieve superior performance over the state-of-the-art methods. |
2019 | Exploring Auxiliary Context Discrete Semantic Transfer Hashing For Scalable Image Retrieval | Zhu Lei, Huang Zi, Li Zhihui, Xie Liang, Shen Heng Tao | Arxiv | Unsupervised hashing can desirably support scalable content-based image retrieval (SCBIR) for its appealing advantages of semantic label independence memory and search efficiency. However the learned hash codes are embedded with limited discriminative semantics due to the intrinsic limitation of image representation. To address the problem in this paper we propose a novel hashing approach dubbed as (emph)Discrete Semantic Transfer Hashing (DSTH). The key idea is to (emphdirectly) augment the semantics of discrete image hash codes by exploring auxiliary contextual modalities. To this end a unified hashing framework is formulated to simultaneously preserve visual similarities of images and perform semantic transfer from contextual modalities. Further to guarantee direct semantic transfer and avoid information loss we explicitly impose the discrete constraint bit–uncorrelation constraint and bit-balance constraint on hash codes. A novel and effective discrete optimization method based on augmented Lagrangian multiplier is developed to iteratively solve the optimization problem. The whole learning process has linear computation complexity and desirable scalability. Experiments on three benchmark datasets demonstrate the superiority of DSTH compared with several state-of-the-art approaches. |
2019 | DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs | Erkun Yang, Tongliang Liu, Cheng Deng, Wei Liu, Dacheng Tao | CVPR | Due to the high storage and search efficiency, hashing has become prevalent for large-scale similarity search. Particularly, deep hashing methods have greatly improved the search performance under supervised scenarios. In contrast, unsupervised deep hashing models can hardly achieve satisfactory performance due to the lack of reliable supervisory similarity signals. To address this issue, we propose a novel deep unsupervised hashing model, dubbed DistillHash, which can learn a distilled data set consisted of data pairs, which have confidence similarity signals. Specifically, we investigate the relationship between the initial noisy similarity signals learned from local structures and the semantic similarity labels assigned by a Bayes optimal classifier. We show that under a mild assumption, some data pairs, of which labels are consistent with those assigned by the Bayes optimal classifier, can be potentially distilled. Inspired by this fact, we design a simple yet effective strategy to distill data pairs automatically and further adopt a Bayesian learning framework to learn hash functions from the distilled data set. Extensive experimental results on three widely used benchmark datasets show that the proposed DistillHash consistently accomplishes the stateof-the-art search performance. |
2019 | Global Hashing System For Fast Image Search | Tian Dayong, Tao Dacheng | Arxiv | Hashing methods have been widely investigated for fast approximate nearest neighbor searching in large data sets. Most existing methods use binary vectors in lower dimensional spaces to represent data points that are usually real vectors of higher dimensionality. We divide the hashing process into two steps. Data points are first embedded in a low-dimensional space and the global positioning system method is subsequently introduced but modified for binary embedding. We devise dataindependent and data-dependent methods to distribute the satellites at appropriate locations. Our methods are based on finding the tradeoff between the information losses in these two steps. Experiments show that our data-dependent method outperforms other methods in different-sized data sets from 100k to 10M. By incorporating the orthogonality of the code matrix both our data-independent and data-dependent methods are particularly impressive in experiments on longer bits. |
2019 | Deep Heterogeneous Hashing For Face Video Retrieval | Qiao Shishi, Wang Ruiping, Shan Shiguang, Chen Xilin | IEEE Transactions on Image Processing | Retrieving videos of a particular person with face image as a query via hashing technique has many important applications. While face images are typically represented as vectors in Euclidean space characterizing face videos with some robust set modeling techniques (e.g. covariance matrices as exploited in this study which reside on Riemannian manifold) has recently shown appealing advantages. This hence results in a thorny heterogeneous spaces matching problem. Moreover hashing with handcrafted features as done in many existing works is clearly inadequate to achieve desirable performance for this task. To address such problems we present an end-to-end Deep Heterogeneous Hashing (DHH) method that integrates three stages including image feature learning video modeling and heterogeneous hashing in a single framework to learn unified binary codes for both face images and videos. To tackle the key challenge of hashing on the manifold a well-studied Riemannian kernel mapping is employed to project data (i.e. covariance matrices) into Euclidean space and thus enables to embed the two heterogeneous representations into a common Hamming space where both intra-space discriminability and inter-space compatibility are considered. To perform network optimization the gradient of the kernel mapping is innovatively derived via structured matrix backpropagation in a theoretically principled way. Experiments on three challenging datasets show that our method achieves quite competitive performance compared with existing hashing methods. |
2019 | Deep Supervised Hashing Leveraging Quadratic Spherical Mutual Information For Content-based Image Retrieval | Passalis Nikolaos, Tefas Anastasios | Arxiv | Several deep supervised hashing techniques have been proposed to allow for efficiently querying large image databases. However deep supervised image hashing techniques are developed to a great extent heuristically often leading to suboptimal results. Contrary to this we propose an efficient deep supervised hashing algorithm that optimizes the learned codes using an information-theoretic measure the Quadratic Mutual Information (QMI). The proposed method is adapted to the needs of large-scale hashing and information retrieval leading to a novel information-theoretic measure the Quadratic Spherical Mutual Information (QSMI). Apart from demonstrating the effectiveness of the proposed method under different scenarios and outperforming existing state-of-the-art image hashing techniques this paper provides a structured way to model the process of information retrieval and develop novel methods adapted to the needs of each application. |
2019 | Unsupervised Neural Quantization For Compressed-domain Similarity Search | Morozov Stanislav, Babenko Artem | Arxiv | We tackle the problem of unsupervised visual descriptors compression which is a key ingredient of large-scale image retrieval systems. While the deep learning machinery has benefited literally all computer vision pipelines the existing state-of-the-art compression methods employ shallow architectures and we aim to close this gap by our paper. In more detail we introduce a DNN architecture for the unsupervised compressed-domain retrieval based on multi-codebook quantization. The proposed architecture is designed to incorporate both fast data encoding and efficient distances computation via lookup tables. We demonstrate the exceptional advantage of our scheme over existing quantization approaches on several datasets of visual descriptors via outperforming the previous state-of-the-art by a large margin. |
2019 | Deep Cross-modal Hashing With Hashing Functions And Unified Hash Codes Jointly Learning | Tu Rong-cheng, Mao Xian-ling, Ma Bing, Hu Yong, Yan Tan, Wei Wei, Huang Heyan | Arxiv | Due to their high retrieval efficiency and low storage cost cross-modal hashing methods have attracted considerable attention. Generally compared with shallow cross-modal hashing methods deep cross-modal hashing methods can achieve a more satisfactory performance by integrating feature learning and hash codes optimizing into a same framework. However most existing deep cross-modal hashing methods either cannot learn a unified hash code for the two correlated data-points of different modalities in a database instance or cannot guide the learning of unified hash codes by the feedback of hashing function learning procedure to enhance the retrieval accuracy. To address the issues above in this paper we propose a novel end-to-end Deep Cross-Modal Hashing with Hashing Functions and Unified Hash Codes Jointly Learning (DCHUC). Specifically by an iterative optimization algorithm DCHUC jointly learns unified hash codes for image-text pairs in a database and a pair of hash functions for unseen query image-text pairs. With the iterative optimization algorithm the learned unified hash codes can be used to guide the hashing function learning procedure; Meanwhile the learned hashing functions can feedback to guide the unified hash codes optimizing procedure. Extensive experiments on three public datasets demonstrate that the proposed method outperforms the state-of-the-art cross-modal hashing methods. |
2019 | Deep Hashing Using Entropy Regularised Product Quantisation Network | Schlemper Jo, Caballero Jose, Aitken Andy, Van Amersfoort Joost | Arxiv | In large scale systems approximate nearest neighbour search is a crucial algorithm to enable efficient data retrievals. Recently deep learning-based hashing algorithms have been proposed as a promising paradigm to enable data dependent schemes. Often their efficacy is only demonstrated on data sets with fixed limited numbers of classes. In practical scenarios those labels are not always available or one requires a method that can handle a higher input variability as well as a higher granularity. To fulfil those requirements we look at more flexible similarity measures. In this work we present a novel flexible end-to-end trainable network for large-scale data hashing. Our method works by transforming the data distribution to behave as a uniform distribution on a product of spheres. The transformed data is subsequently hashed to a binary form in a way that maximises entropy of the output (i.e. to fully utilise the available bit-rate capacity) while maintaining the correctness (i.e. close items hash to the same key in the map). We show that the method outperforms baseline approaches such as locality-sensitive hashing and product quantisation in the limited capacity regime. |
2018 | Unsupervised Semantic Deep Hashing | Jin Sheng | Arxiv | In recent years deep hashing methods have been proved to be efficient since it employs convolutional neural network to learn features and hashing codes simultaneously. However these methods are mostly supervised. In real-world application it is a time-consuming and overloaded task for annotating a large number of images. In this paper we propose a novel unsupervised deep hashing method for large-scale image retrieval. Our method namely unsupervised semantic deep hashing ((textbfUSDH)) uses semantic information preserved in the CNN feature layer to guide the training of network. We enforce four criteria on hashing codes learning based on VGG-19 model 1) preserving relevant information of feature space in hashing space; 2) minimizing quantization loss between binary-like codes and hashing codes; 3) improving the usage of each bit in hashing codes by using maximum information entropy and 4) invariant to image rotation. Extensive experiments on CIFAR-10 NUSWIDE have demonstrated that (textbfUSDH) outperforms several state-of-the-art unsupervised hashing methods for image retrieval. We also conduct experiments on Oxford 17 datasets for fine-grained classification to verify its efficiency for other computer vision tasks. |
2018 | Deep Saliency Hashing | Jin Sheng, Yao Hongxun, Sun Xiaoshuai, Zhou Shangchen, Zhang Lei, Hua Xiansheng | Arxiv | In recent years hashing methods have been proved to be effective and efficient for the large-scale Web media search. However the existing general hashing methods have limited discriminative power for describing fine-grained objects that share similar overall appearance but have subtle difference. To solve this problem we for the first time introduce the attention mechanism to the learning of fine-grained hashing codes. Specifically we propose a novel deep hashing model named deep saliency hashing (DSaH) which automatically mines salient regions and learns semantic-preserving hashing codes simultaneously. DSaH is a two-step end-to-end model consisting of an attention network and a hashing network. Our loss function contains three basic components including the semantic loss the saliency loss and the quantization loss. As the core of DSaH the saliency loss guides the attention network to mine discriminative regions from pairs of images. We conduct extensive experiments on both fine-grained and general retrieval datasets for performance evaluation. Experimental results on fine-grained datasets including Oxford Flowers-17 Stanford Dogs-120 and CUB Bird demonstrate that our DSaH performs the best for fine-grained retrieval task and beats the strongest competitor (DTQ) by approximately 1037; on both Stanford Dogs-120 and CUB Bird. DSaH is also comparable to several state-of-the-art hashing methods on general datasets including CIFAR-10 and NUS-WIDE. |
2018 | Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder | Jingkuan Song, Hanwang Zhang, Xiangpeng Li, Lianli Gao, Meng Wang, Richang Hong | TIP | Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework dubbed Self-Supervised Video Hashing (SSVH), that is able to capture the temporal nature of videos in an end-to-end learning-to-hash fashion. We specifically address two central problems: 1) how to design an encoder-decoder architecture to generate binary codes for videos; and 2) how to equip the binary codes with the ability of accurate video retrieval. We design a hierarchical binary autoencoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked architecture. Then, we encourage the binary codes to simultaneously reconstruct the visual content and neighborhood structure of the videos. Experiments on two real-world datasets (FCVID and YFCC) show that our SSVH method can significantly outperform the state-of-the-art methods and achieve the currently best performance on the task of unsupervised video retrieval. |
2018 | Efficient End-to-end Learning For Quantizable Representations | Jeong Yeonwoo, Song Hyun Oh | Arxiv | Embedding representation learning via neural networks is at the core foundation of modern similarity based search. While much effort has been put in developing algorithms for learning binary hamming code representations for search efficiency this still requires a linear scan of the entire dataset per each query and trades off the search accuracy through binarization. To this end we consider the problem of directly learning a quantizable embedding representation and the sparse binary hash code end-to-end which can be used to construct an efficient hash table not only providing significant search reduction in the number of data but also achieving the state of the art search accuracy outperforming previous state of the art deep metric learning methods. We also show that finding the optimal sparse binary hash code in a mini-batch can be computed exactly in polynomial time by solving a minimum cost flow problem. Our results on Cifar-100 and on ImageNet datasets show the state of the art search accuracy in precision@k and NMI metrics while providing up to 98X and 478X search speedup respectively over exhaustive linear search. The source code is available at https://github.com/maestrojeong/Deep-Hash-Table-ICML18” |
2018 | Deep Attention-guided Hashing | Yang Zhan, Raymond Osolo Ian, Sun Wuqing, Long Jun | Arxiv | With the rapid growth of multimedia data (e.g. image audio and video etc.) on the web learning-based hashing techniques such as Deep Supervised Hashing (DSH) have proven to be very efficient for large-scale multimedia search. The recent successes seen in Learning-based hashing methods are largely due to the success of deep learning-based hashing methods. However there are some limitations to previous learning-based hashing methods (e.g. the learned hash codes containing repetitive and highly correlated information). In this paper we propose a novel learning-based hashing method named Deep Attention-guided Hashing (DAgH). DAgH is implemented using two stream frameworks. The core idea is to use guided hash codes which are generated by the hashing network of the first stream framework (called first hashing network) to guide the training of the hashing network of the second stream framework (called second hashing network). Specifically in the first network it leverages an attention network and hashing network to generate the attention-guided hash codes from the original images. The loss function we propose contains two components the semantic loss and the attention loss. The attention loss is used to punish the attention network to obtain the salient region from pairs of images; in the second network these attention-guided hash codes are used to guide the training of the second hashing network (i.e. these codes are treated as supervised labels to train the second network). By doing this DAgH can make full use of the most critical information contained in images to guide the second hashing network in order to learn efficient hash codes in a true end-to-end fashion. Results from our experiments demonstrate that DAgH can generate high quality hash codes and it outperforms current state-of-the-art methods on three benchmark datasets CIFAR-10 NUS-WIDE and ImageNet. |
2018 | Attribute-guided Network For Cross-modal Zero-shot Hashing | Ji Zhong, Sun Yuxin, Yu Yunlong, Pang Yanwei, Han Jungong | Arxiv | Zero-Shot Hashing aims at learning a hashing model that is trained only by instances from seen categories but can generate well to those of unseen categories. Typically it is achieved by utilizing a semantic embedding space to transfer knowledge from seen domain to unseen domain. Existing efforts mainly focus on single-modal retrieval task especially Image-Based Image Retrieval (IBIR). However as a highlighted research topic in the field of hashing cross-modal retrieval is more common in real world applications. To address the Cross-Modal Zero-Shot Hashing (CMZSH) retrieval task we propose a novel Attribute-Guided Network (AgNet) which can perform not only IBIR but also Text-Based Image Retrieval (TBIR). In particular AgNet aligns different modal data into a semantically rich attribute space which bridges the gap caused by modality heterogeneity and zero-shot setting. We also design an effective strategy that exploits the attribute to guide the generation of hash codes for image and text within the same network. Extensive experimental results on three benchmark datasets (AwA SUN and ImageNet) demonstrate the superiority of AgNet on both cross-modal and single-modal zero-shot image retrieval tasks. |
2018 | Error Correction Maximization For Deep Image Hashing | Xu Xiang, Wang Xiaofang, Kitani Kris M. | Arxiv | We propose to use the concept of the Hamming bound to derive the optimal criteria for learning hash codes with a deep network. In particular when the number of binary hash codes (typically the number of image categories) and code length are known it is possible to derive an upper bound on the minimum Hamming distance between the hash codes. This upper bound can then be used to define the loss function for learning hash codes. By encouraging the margin (minimum Hamming distance) between the hash codes of different image categories to match the upper bound we are able to learn theoretically optimal hash codes. Our experiments show that our method significantly outperforms competing deep learning-based approaches and obtains top performance on benchmark datasets. |
2018 | Deep LDA Hashing | Hu Di, Nie Feiping, Li Xuelong | Arxiv | The conventional supervised hashing methods based on classification do not entirely meet the requirements of hashing technique but Linear Discriminant Analysis (LDA) does. In this paper we propose to perform a revised LDA objective over deep networks to learn efficient hashing codes in a truly end-to-end fashion. However the complicated eigenvalue decomposition within each mini-batch in every epoch has to be faced with when simply optimizing the deep network w.r.t. the LDA objective. In this work the revised LDA objective is transformed into a simple least square problem which naturally overcomes the intractable problems and can be easily solved by the off-the-shelf optimizer. Such deep extension can also overcome the weakness of LDA Hashing in the limited linear projection and feature learning. Amounts of experiments are conducted on three benchmark datasets. The proposed Deep LDA Hashing shows nearly 70 points improvement over the conventional one on the CIFAR-10 dataset. It also beats several state-of-the-art methods on various metrics. |
2018 | From Hashing To Cnns Training Binaryweight Networks Via Hashing | Hu Qinghao, Wang Peisong, Cheng Jian | Arxiv | Deep convolutional neural networks (CNNs) have shown appealing performance on various computer vision tasks in recent years. This motivates people to deploy CNNs to realworld applications. However most of state-of-art CNNs require large memory and computational resources which hinders the deployment on mobile devices. Recent studies show that low-bit weight representation can reduce much storage and memory demand and also can achieve efficient network inference. To achieve this goal we propose a novel approach named BWNH to train Binary Weight Networks via Hashing. In this paper we first reveal the strong connection between inner-product preserving hashing and binary weight networks and show that training binary weight networks can be intrinsically regarded as a hashing problem. Based on this perspective we propose an alternating optimization method to learn the hash codes instead of directly learning binary weights. Extensive experiments on CIFAR10 CIFAR100 and ImageNet demonstrate that our proposed BWNH outperforms current state-of-art by a large margin. |
2018 | On The Needs For Rotations In Hypercubic Quantization Hashing | Morvan Anne, Souloumiac Antoine, Choromanski Krzysztof, Gouy-pailler Cédric, Atif Jamal | Arxiv | The aim of this paper is to endow the well-known family of hypercubic quantization hashing methods with theoretical guarantees. In hypercubic quantization applying a suitable (random or learned) rotation after dimensionality reduction has been experimentally shown to improve the results accuracy in the nearest neighbors search problem. We prove in this paper that the use of these rotations is optimal under some mild assumptions getting optimal binary sketches is equivalent to applying a rotation uniformizing the diagonal of the covariance matrix between data points. Moreover for two closed points the probability to have dissimilar binary sketches is upper bounded by a factor of the initial distance between the data points. Relaxing these assumptions we obtain a general concentration result for random matrices. We also provide some experiments illustrating these theoretical points and compare a set of algorithms in both the batch and online settings. |
2018 | Simultaneous Compression And Quantization A Joint Approach For Efficient Unsupervised Hashing | Hoang Tuan, Do Thanh-toan, Le Huu, Le-tan Dang-khoa, Cheung Ngai-man | Arxiv | For unsupervised data-dependent hashing the two most important requirements are to preserve similarity in the low-dimensional feature space and to minimize the binary quantization loss. A well-established hashing approach is Iterative Quantization (ITQ) which addresses these two requirements in separate steps. In this paper we revisit the ITQ approach and propose novel formulations and algorithms to the problem. Specifically we propose a novel approach named Simultaneous Compression and Quantization (SCQ) to jointly learn to compress (reduce dimensionality) and binarize input data in a single formulation under strict orthogonal constraint. With this approach we introduce a loss function and its relaxed version termed Orthonormal Encoder (OnE) and Orthogonal Encoder (OgE) respectively which involve challenging binary and orthogonal constraints. We propose to attack the optimization using novel algorithms based on recent advances in cyclic coordinate descent approach. Comprehensive experiments on unsupervised image retrieval demonstrate that our proposed methods consistently outperform other state-of-the-art hashing methods. Notably our proposed methods outperform recent deep neural networks and GAN based hashing in accuracy while being very computationally-efficient. |
2018 | Greedy Hash Towards Fast Optimization For Accurate Hash Coding In CNN | Shupeng Su, Chao Zhang, Kai Han, Yonghong Tian | Neural Information Processing Systems | To convert the input into binary code hashing algorithm has been widely used for approximate nearest neighbor search on large-scale image sets due to its computation and storage efficiency. Deep hashing further improves the retrieval quality by combining the hash coding with deep neural network. However a major difficulty in deep hashing lies in the discrete constraints imposed on the network output which generally makes the optimization NP hard. In this work we adopt the greedy principle to tackle this NP hard problem by iteratively updating the network toward the probable optimal discrete solution in each iteration. A hash coding layer is designed to implement our approach which strictly uses the sign function in forward propagation to maintain the discrete constraints while in back propagation the gradients are transmitted intactly to the front layer to avoid the vanishing gradients. In addition to the theoretical derivation we provide a new perspective to visualize and understand the effectiveness and efficiency of our algorithm. Experiments on benchmark datasets show that our scheme outperforms state-of-the-art hashing methods in both supervised and unsupervised tasks. |
2018 | Hashing as Tie-Aware Learning to Rank | K. He, F. Cakir, S. Bargal, S. Sclaroff | CVPR | Hashing, or learning binary embeddings of data, is frequently used in nearest neighbor retrieval. In this paper, we develop learning to rank formulations for hashing, aimed at directly optimizing ranking-based evaluation metrics such as Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG). We first observe that the integer-valued Hamming distance often leads to tied rankings, and propose to use tie-aware versions of AP and NDCG to evaluate hashing for retrieval. Then, to optimize tie-aware ranking metrics, we derive their continuous relaxations, and perform gradient-based optimization with deep neural networks. Our results establish the new state-of-the-art for image retrieval by Hamming ranking in common benchmarks. |
2018 | Bingan Learning Compact Binary Descriptors With A Regularized GAN | Maciej Zieba, Piotr Semberecki, Tarek El-gaaly, Tomasz Trzcinski | Neural Information Processing Systems | In this paper we propose a novel regularization method for Generative Adversarial Networks that allows the model to learn discriminative yet compact binary representations of image patches (image descriptors). We exploit the dimensionality reduction that takes place in the intermediate layers of the discriminator network and train the binarized penultimate layers low-dimensional representation to mimic the distribution of the higher-dimensional preceding layers. To achieve this we introduce two loss terms that aim at (i) reducing the correlation between the dimensions of the binarized penultimate layers low-dimensional representation (i.e. maximizing joint entropy) and (ii) propagating the relations between the dimensions in the high-dimensional space to the low-dimensional space. We evaluate the resulting binary image descriptors on two challenging applications image matching and retrieval where they achieve state-of-the-art results. |
2018 | Regularizing Deep Hashing Networks Using GAN Generated Fake Images | Geng Libing, Pan Yan, Chen Jikai, Lai Hanjiang | Arxiv | Recently deep-networks-based hashing (deep hashing) has become a leading approach for large-scale image retrieval. It aims to learn a compact bitwise representation for images via deep networks so that similar images are mapped to nearby hash codes. Since a deep network model usually has a large number of parameters it may probably be too complicated for the training data we have leading to model over-fitting. To address this issue in this paper we propose a simple two-stage pipeline to learn deep hashing models by regularizing the deep hashing networks using fake images. The first stage is to generate fake images from the original training set without extra data via a generative adversarial network (GAN). In the second stage we propose a deep architec- ture to learn hash functions in which we use a maximum-entropy based loss to incorporate the newly created fake images by the GAN. We show that this loss acts as a strong regularizer of the deep architecture by penalizing low-entropy output hash codes. This loss can also be interpreted as a model ensemble by simultaneously training many network models with massive weight sharing but over different training sets. Empirical evaluation results on several benchmark datasets show that the proposed method has superior performance gains over state-of-the-art hashing methods. |
2018 | Weakly Supervised Deep Image Hashing Through Tag Embeddings | Gattupalli Vijetha, Zhuo Yaoxin, Li Baoxin | Arxiv | Many approaches to semantic image hashing have been formulated as supervised learning problems that utilize images and label information to learn the binary hash codes. However large-scale labeled image data is expensive to obtain thus imposing a restriction on the usage of such algorithms. On the other hand unlabelled image data is abundant due to the existence of many Web image repositories. Such Web images may often come with images tags that contain useful information although raw tags in general do not readily lead to semantic labels. Motivated by this scenario we formulate the problem of semantic image hashing as a weakly-supervised learning problem. We utilize the information contained in the user-generated tags associated with the images to learn the hash codes. More specifically we extract the word2vec semantic embeddings of the tags and use the information contained in them for constraining the learning. Accordingly we name our model Weakly Supervised Deep Hashing using Tag Embeddings (WDHT). WDHT is tested for the task of semantic image retrieval and is compared against several state-of-art models. Results show that our approach sets a new state-of-art in the area of weekly supervised image hashing. |
2018 | Neurons Merging Layer Towards Progressive Redundancy Reduction For Deep Supervised Hashing | Fu Chaoyou, Song Liangchen, Wu Xiang, Wang Guoli, He Ran | Arxiv | Deep supervised hashing has become an active topic in information retrieval. It generates hashing bits by the output neurons of a deep hashing network. During binary discretization there often exists much redundancy between hashing bits that degenerates retrieval performance in terms of both storage and accuracy. This paper proposes a simple yet effective Neurons Merging Layer (NMLayer) for deep supervised hashing. A graph is constructed to represent the redundancy relationship between hashing bits that is used to guide the learning of a hashing network. Specifically it is dynamically learned by a novel mechanism defined in our active and frozen phases. According to the learned relationship the NMLayer merges the redundant neurons together to balance the importance of each output neuron. Moreover multiple NMLayers are progressively trained for a deep hashing network to learn a more compact hashing code from a long redundant code. Extensive experiments on four datasets demonstrate that our proposed method outperforms state-of-the-art hashing methods. |
2018 | Bagminhash - Minwise Hashing Algorithm For Weighted Sets | Ertl Otmar | Arxiv | Minwise hashing has become a standard tool to calculate signatures which allow direct estimation of Jaccard similarities. While very efficient algorithms already exist for the unweighted case the calculation of signatures for weighted sets is still a time consuming task. BagMinHash is a new algorithm that can be orders of magnitude faster than current state of the art without any particular restrictions or assumptions on weights or data dimensionality. Applied to the special case of unweighted sets it represents the first efficient algorithm producing independent signature components. A series of tests finally verifies the new algorithm and also reveals limitations of other approaches published in the recent past. |
2018 | Improving Similarity Search With High-dimensional Locality-sensitive Hashing | Sharma Jaiyam, Navlakha Saket | Arxiv | We propose a new class of data-independent locality-sensitive hashing (LSH) algorithms based on the fruit fly olfactory circuit. The fundamental difference of this approach is that instead of assigning hashes as dense points in a low dimensional space hashes are assigned in a high dimensional space which enhances their separability. We show theoretically and empirically that this new family of hash functions is locality-sensitive and preserves rank similarity for inputs in any p space. We then analyze different variations on this strategy and show empirically that they outperform existing LSH methods for nearest-neighbors search on six benchmark datasets. Finally we propose a multi-probe version of our algorithm that achieves higher performance for the same query time or conversely that maintains performance of prior approaches while taking significantly less indexing time and memory. Overall our approach leverages the advantages of separability provided by high-dimensional spaces while still remaining computationally efficient |
2018 | Towards Optimal Deep Hashing via Policy Gradient | Xin Yuan, Liangliang Ren, Jiwen Lu, and Jie Zhou | ECCV | In this paper, we propose a simple yet effective relaxation free method to learn more effective binary codes via policy gradient for scalable image search. While a variety of deep hashing methods have been proposed in recent years, most of them are confronted by the dilemma to obtain optimal binary codes in a truly end-to-end manner with nonsmooth sign activations. Unlike existing methods which usually employ a general relaxation framework to adapt to the gradient-based algorithms, our approach formulates the non-smooth part of the hashing network as sampling with a stochastic policy, so that the retrieval performance degradation caused by the relaxation can be avoided. Specifically, our method directly generates the binary codes and maximizes the expectation of rewards for similarity preservation, where the network can be trained directly via policy gradient. Hence, the differentiation challenge for discrete optimization can be naturally addressed, which leads to effective gradients and binary codes. Extensive experimental results on three benchmark datasets validate the effectiveness of the proposed method. |
2018 | Compressing Deep Neural Networks A New Hashing Pipeline Using Kacs Random Walk Matrices | Parker-holder Jack, Gass Sam | Arxiv | The popularity of deep learning is increasing by the day. However despite the recent advancements in hardware deep neural networks remain computationally intensive. Recent work has shown that by preserving the angular distance between vectors random feature maps are able to reduce dimensionality without introducing bias to the estimator. We test a variety of established hashing pipelines as well as a new approach using Kacs random walk matrices. We demonstrate that this method achieves similar accuracy to existing pipelines. |
2018 | Learning Effective Binary Visual Representations With Deep Networks | Wu Jianxin, Luo Jian-hao | Arxiv | Although traditionally binary visual representations are mainly designed to reduce computational and storage costs in the image retrieval research this paper argues that binary visual representations can be applied to large scale recognition and detection problems in addition to hashing in retrieval. Furthermore the binary nature may make it generalize better than its real-valued counterparts. Existing binary hashing methods are either two-stage or hinging on loss term regularization or saturated functions hence converge slowly and only emit soft binary values. This paper proposes Approximately Binary Clamping (ABC) which is non-saturating end-to-end trainable with fast convergence and can output true binary visual representations. ABC achieves comparable accuracy in ImageNet classification as its real-valued counterpart and even generalizes better in object detection. On benchmark image retrieval datasets ABC also outperforms existing hashing methods. |
2018 | From Selective Deep Convolutional Features To Compact Binary Representations For Image Retrieval | Do Thanh-toan, Hoang Tuan, Tan Dang-khoa Le, Le Huu, Nguyen Tam V., Cheung Ngai-man | Arxiv | In the large-scale image retrieval task the two most important requirements are the discriminability of image representations and the efficiency in computation and storage of representations. Regarding the former requirement Convolutional Neural Network (CNN) is proven to be a very powerful tool to extract highly discriminative local descriptors for effective image search. Additionally in order to further improve the discriminative power of the descriptors recent works adopt fine-tuned strategies. In this paper taking a different approach we propose a novel computationally efficient and competitive framework. Specifically we firstly propose various strategies to compute masks namely SIFT-mask SUM-mask and MAX-mask to select a representative subset of local convolutional features and eliminate redundant features. Our in-depth analyses demonstrate that proposed masking schemes are effective to address the burstiness drawback and improve retrieval accuracy. Secondly we propose to employ recent embedding and aggregating methods which can significantly boost the feature discriminability. Regarding the computation and storage efficiency we include a hashing module to produce very compact binary image representations. Extensive experiments on six image retrieval benchmarks demonstrate that our proposed framework achieves the state-of-the-art retrieval performances. |
2018 | Binary Constrained Deep Hashing Network For Image Retrieval Without Manual Annotation | Do Thanh-toan, Hoang Tuan, Tan Dang-khoa Le, Pham Trung, Le Huu, Cheung Ngai-man, Reid Ian | Arxiv | Learning compact binary codes for image retrieval task using deep neural networks has attracted increasing attention recently. However training deep hashing networks for the task is challenging due to the binary constraints on the hash codes the similarity preserving property and the requirement for a vast amount of labelled images. To the best of our knowledge none of the existing methods has tackled all of these challenges completely in a unified framework. In this work we propose a novel end-to-end deep learning approach for the task in which the network is trained to produce binary codes directly from image pixels without the need of manual annotation. In particular to deal with the non-smoothness of binary constraints we propose a novel pairwise constrained loss function which simultaneously encodes the distances between pairs of hash codes and the binary quantization error. In order to train the network with the proposed loss function we propose an efficient parameter learning algorithm. In addition to provide similar / dissimilar training images to train the network we exploit 3D models reconstructed from unlabelled images for automatic generation of enormous training image pairs. The extensive experiments on image retrieval benchmark datasets demonstrate the improvements of the proposed method over the state-of-the-art compact representation methods on the image retrieval problem. |
2018 | Mean Local Group Average Precision (mlgap) A New Performance Metric For Hashing-based Retrieval | Ding Pak Lun Kevin, Li Yikang, Li Baoxin | Arxiv | The research on hashing techniques for visual data is gaining increased attention in recent years due to the need for compact representations supporting efficient search/retrieval in large-scale databases such as online images. Among many possibilities Mean Average Precision(mAP) has emerged as the dominant performance metric for hashing-based retrieval. One glaring shortcoming of mAP is its inability in balancing retrieval accuracy and utilization of hash codes pushing a system to attain higher mAP will inevitably lead to poorer utilization of the hash codes. Poor utilization of the hash codes hinders good retrieval because of increased collision of samples in the hash space. This means that a model giving a higher mAP values does not necessarily do a better job in retrieval. In this paper we introduce a new metric named Mean Local Group Average Precision (mLGAP) for better evaluation of the performance of hashing-based retrieval. The new metric provides a retrieval performance measure that also reconciles the utilization of hash codes leading to a more practically meaningful performance metric than conventional ones like mAP. To this end we start by mathematical analysis of the deficiencies of mAP for hashing-based retrieval. We then propose mLGAP and show why it is more appropriate for hashing-based retrieval. Experiments on image retrieval are used to demonstrate the effectiveness of the proposed metric. |
2018 | Learning Decorrelated Hashing Codes For Multimodal Retrieval | Tian Dayong | Arxiv | In social networks heterogeneous multimedia data correlate to each other such as videos and their corresponding tags in YouTube and image-text pairs in Facebook. Nearest neighbor retrieval across multiple modalities on large data sets becomes a hot yet challenging problem. Hashing is expected to be an efficient solution since it represents data as binary codes. As the bit-wise XOR operations can be fast handled the retrieval time is greatly reduced. Few existing multimodal hashing methods consider the correlation among hashing bits. The correlation has negative impact on hashing codes. When the hashing code length becomes longer the retrieval performance improvement becomes slower. In this paper we propose a minimum correlation regularization (MCR) for multimodal hashing. First the sigmoid function is used to embed the data matrices. Then the MCR is applied on the output of sigmoid function. As the output of sigmoid function approximates a binary code matrix the proposed MCR can efficiently decorrelate the hashing codes. Experiments show the superiority of the proposed method becomes greater as the code length increases. |
2018 | Deep Semantic Hashing With Generative Adversarial Networks | Qiu Zhaofan, Pan Yingwei, Yao Ting, Mei Tao | Arxiv | Hashing has been a widely-adopted technique for nearest neighbor search in large-scale image retrieval tasks. Recent research has shown that leveraging supervised information can lead to high quality hashing. However the cost of annotating data is often an obstacle when applying supervised hashing to a new domain. Moreover the results can suffer from the robustness problem as the data at training and test stage could come from similar but different distributions. This paper studies the exploration of generating synthetic data through semi-supervised generative adversarial networks (GANs) which leverages largely unlabeled and limited labeled training data to produce highly compelling data with intrinsic invariance and global coherence for better understanding statistical structures of natural data. We demonstrate that the above two limitations can be well mitigated by applying the synthetic data for hashing. Specifically a novel deep semantic hashing with GANs (DSH-GANs) is presented which mainly consists of four components a deep convolution neural networks (CNN) for learning image representations an adversary stream to distinguish synthetic images from real ones a hash stream for encoding image representations to hash codes and a classification stream. The whole architecture is trained end-to-end by jointly optimizing three losses i.e. adversarial loss to correct label of synthetic or real for each sample triplet ranking loss to preserve the relative similarity ordering in the input real-synthetic triplets and classification loss to classify each sample accurately. Extensive experiments conducted on both CIFAR-10 and NUS-WIDE image benchmarks validate the capability of exploiting synthetic images for hashing. Our framework also achieves superior results when compared to state-of-the-art deep hash models. |
2018 | Cycle-consistent Deep Generative Hashing For Cross-modal Retrieval | Wu Lin, Wang Yang, Shao Ling | Arxiv | In this paper we propose a novel deep generative approach to cross-modal retrieval to learn hash functions in the absence of paired training samples through the cycle consistency loss. Our proposed approach employs adversarial training scheme to lean a couple of hash functions enabling translation between modalities while assuming the underlying semantic relationship. To induce the hash codes with semantics to the input-output pair cycle consistency loss is further proposed upon the adversarial training to strengthen the correlations between inputs and corresponding outputs. Our approach is generative to learn hash functions such that the learned hash codes can maximally correlate each input-output correspondence meanwhile can also regenerate the inputs so as to minimize the information loss. The learning to hash embedding is thus performed to jointly optimize the parameters of the hash functions across modalities as well as the associated generative models. Extensive experiments on a variety of large-scale cross-modal data sets demonstrate that our proposed method achieves better retrieval results than the state-of-the-arts. |
2018 | Bernoulli Embeddings For Graphs | Misra Vinith, Bhatia Sumit | Arxiv | Just as semantic hashing can accelerate information retrieval binary valued embeddings can significantly reduce latency in the retrieval of graphical data. We introduce a simple but effective model for learning such binary vectors for nodes in a graph. By imagining the embeddings as independent coin flips of varying bias continuous optimization techniques can be applied to the approximate expected loss. Embeddings optimized in this fashion consistently outperform the quantization of both spectral graph embeddings and various learned real-valued embeddings on both ranking and pre-ranking tasks for a variety of datasets. |
2018 | Locality-Sensitive Hashing for Earthquake Detection: A Case Study of Scaling Data-Driven Science | Kexin Rong, Clara E. Yoon, Karianne J. Bergen, Hashem Elezabi,Peter Bailis, Philip Levis, Gregory C. Beroza | VLDB | In this work, we report on a novel application of Locality Sensitive Hashing (LSH) to seismic data at scale. Based on the high waveform similarity between reoccurring earthquakes, our application identifies potential earthquakes by searching for similar time series segments via LSH. However, a straightforward implementation of this LSH-enabled application has difficulty scaling beyond 3 months of continuous time series data measured at a single seismic station. As a case study of a data-driven science workflow, we illustrate how domain knowledge can be incorporated into the workload to improve both the efficiency and result quality. We describe several end-toend optimizations of the analysis pipeline from pre-processing to post-processing, which allow the application to scale to time series data measured at multiple seismic stations. Our optimizations enable an over 100× speedup in the end-to-end analysis pipeline. This improved scalability enabled seismologists to perform seismic analysis on more than ten years of continuous time series data from over ten seismic stations, and has directly enabled the discovery of 597 new earthquakes near the Diablo Canyon nuclear power plant in California and 6123 new earthquakes in New Zealand. |
2018 | Deep Class-wise Hashing Semantics-preserving Hashing Via Class-wise Loss | Zhe Xuefei, Chen Shifeng, Yan Hong | Arxiv | Deep supervised hashing has emerged as an influential solution to large-scale semantic image retrieval problems in computer vision. In the light of recent progress convolutional neural network based hashing methods typically seek pair-wise or triplet labels to conduct the similarity preserving learning. However complex semantic concepts of visual contents are hard to capture by similar/dissimilar labels which limits the retrieval performance. Generally pair-wise or triplet losses not only suffer from expensive training costs but also lack in extracting sufficient semantic information. In this regard we propose a novel deep supervised hashing model to learn more compact class-level similarity preserving binary codes. Our deep learning based model is motivated by deep metric learning that directly takes semantic labels as supervised information in training and generates corresponding discriminant hashing code. Specifically a novel cubic constraint loss function based on Gaussian distribution is proposed which preserves semantic variations while penalizes the overlap part of different classes in the embedding space. To address the discrete optimization problem introduced by binary codes a two-step optimization strategy is proposed to provide efficient training and avoid the problem of gradient vanishing. Extensive experiments on four large-scale benchmark databases show that our model can achieve the state-of-the-art retrieval performance. Moreover when training samples are limited our method surpasses other supervised deep hashing methods with non-negligible margins. |
2018 | Deep Hashing via Discrepancy Minimization | Zhixiang Chen, Xin Yuan, Jiwen Lu*, Qi Tian, and Jie Zhou, | CVPR | This paper presents a discrepancy minimizing model to address the discrete optimization problem in hashing learning. The discrete optimization introduced by binary constraint is an NP-hard mixed integer programming problem. It is usually addressed by relaxing the binary variables into continuous variables to adapt to the gradient based learning of hashing functions, especially the training of deep neural networks. To deal with the objective discrepancy caused by relaxation, we transform the original binary optimization into differentiable optimization problem over hash functions through series expansion. This transformation decouples the binary constraint and the similarity preserving hashing function optimization. The transformed objective is optimized in a tractable alternating optimization framework with gradual discrepancy minimization. Extensive experimental results on three benchmark datasets validate the efficacy of the proposed discrepancy minimizing hashing. |
2018 | Multi-resolution Hashing For Fast Pairwise Summations | Charikar Moses, Siminelakis Paris | Arxiv | A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X(subset) (mathbbR)^d and a pairwise function w(mathbbR)^d(times) (mathbbR)^d(to) 01 we study the problem of designing a data-structure that enables sublinear-time approximation of the summation Z_w(y)=(frac1)X(sum)_x(in) Xw(xy) for any query y(in) (mathbbR)^d. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators Charikar Siminelakis FOCS17 we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is a collection of T(geq) 1 hashing schemes with collision probabilities p_1(ldots) p_T such that (sup)_t(in) Tp_t(xy) = (Theta)((sqrt)w(xy)). This leads to a data-structure that approximates Z_w(y) using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing Aumuller Christiani Pagh Silvestri PODS18 we show that such a collection can be constructed and evaluated efficiently for any log-convex function w(xy)=e^(phi)((langle) xy(rangle)) of the inner product on the unit sphere xy(in) (mathcalS)^d-1. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density or Partition Function Estimation. We provide extensions of our result from the sphere to (mathbbR)^d and from scalar functions to vector functions. |
2018 | Hashing-based-estimators For Kernel Density In High Dimensions | Charikar Moses, Siminelakis Paris | Arxiv | Given a set of points P(subset) (mathbbR)^d and a kernel k the Kernel Density Estimate at a point x(in)(mathbbR)^d is defined as (mathrmKDE)_P(x)=(frac1)P(sum)_y(in) P k(xy). We study the problem of designing a data structure that given a data set P and a kernel function returns approximations to the kernel density of a query point in sublinear time. We introduce a class of unbiased estimators for kernel density implemented through locality-sensitive hashing and give general theorems bounding the variance of such estimators. These estimators give rise to efficient data structures for estimating the kernel density in high dimensions for a variety of commonly used kernels. Our work is the first to provide data-structures with theoretical guarantees that improve upon simple random sampling in high dimensions. |
2018 | NASH Toward End-to-end Neural Architecture For Generative Semantic Hashing | Shen Dinghan, Su Qinliang, Chapfuwa Paidamoyo, Wang Wenlin, Wang Guoyin, Carin Lawrence, Henao Ricardo | Arxiv | Semantic hashing has become a powerful paradigm for fast similarity search in many information retrieval systems. While fairly successful previous techniques generally require two-stage training and the binary constraints are handled ad-hoc. In this paper we present an end-to-end Neural Architecture for Semantic Hashing (NASH) where the binary hashing codes are treated as Bernoulli latent variables. A neural variational inference framework is proposed for training where gradients are directly back-propagated through the discrete latent variable to optimize the hash function. We also draw connections between proposed method and rate-distortion theory which provides a theoretical foundation for the effectiveness of the proposed framework. Experimental results on three public datasets demonstrate that our method significantly outperforms several state-of-the-art models on both unsupervised and supervised scenarios. |
2018 | Deep Priority Hashing | Cao Zhangjie, Sun Ziping, Long Mingsheng, Wang Jianmin, Yu Philip S. | Arxiv | Deep hashing enables image retrieval by end-to-end learning of deep representations and hash codes from training data with pairwise similarity information. Subject to the distribution skewness underlying the similarity information most existing deep hashing methods may underperform for imbalanced data due to misspecified loss functions. This paper presents Deep Priority Hashing (DPH) an end-to-end architecture that generates compact and balanced hash codes in a Bayesian learning framework. The main idea is to reshape the standard cross-entropy loss for similarity-preserving learning such that it down-weighs the loss associated to highly-confident pairs. This idea leads to a novel priority cross-entropy loss which prioritizes the training on uncertain pairs over confident pairs. Also we propose another priority quantization loss which prioritizes hard-to-quantize examples for generation of nearly lossless hash codes. Extensive experiments demonstrate that DPH can generate high-quality hash codes and yield state-of-the-art image retrieval results on three datasets ImageNet NUS-WIDE and MS-COCO. |
2018 | Deep Cauchy Hashing for Hamming Space Retrieval | Yue Cao, Mingsheng Long, Bin Liu, Jianmin Wang | CVPR | Due to its computation efficiency and retrieval quality, hashing has been widely applied to approximate nearest neighbor search for large-scale image retrieval, while deep hashing further improves the retrieval quality by end-toend representation learning and hash coding. With compact hash codes, Hamming space retrieval enables the most efficient constant-time search that returns data points within a given Hamming radius to each query, by hash table lookups instead of linear scan. However, subject to the weak capability of concentrating relevant images to be within a small Hamming ball due to mis-specified loss functions, existing deep hashing methods may underperform for Hamming space retrieval. This work presents Deep Cauchy Hashing (DCH), a novel deep hashing model that generates compact and concentrated binary hash codes to enable efficient and effective Hamming space retrieval. The main idea is to design a pairwise cross-entropy loss based on Cauchy distribution, which penalizes significantly on similar image pairs with Hamming distance larger than the given Hamming radius threshold. Comprehensive experiments demonstrate that DCH can generate highly concentrated hash codes and yield state-of-the-art Hamming space retrieval performance on three datasets, NUS-WIDE, CIFAR-10, and MS-COCO. |
2018 | HashGAN: Deep Learning to Hash with Pair Conditional Wasserstein GAN | Yue Cao, Mingsheng Long, Bin Liu, Jiamin Wang | CVPR | Deep learning to hash improves image retrieval performance by end-to-end representation learning and hash coding from training data with pairwise similarity information. Subject to the scarcity of similarity information that is often expensive to collect for many application domains, existing deep learning to hash methods may overfit the training data and result in substantial loss of retrieval quality. This paper presents HashGAN, a novel architecture for deep learning to hash, which learns compact binary hash codes from both real images and diverse images synthesized by generative models. The main idea is to augment the training data with nearly real images synthesized from a new Pair Conditional Wasserstein GAN (PC-WGAN) conditioned on the pairwise similarity information. Extensive experiments demonstrate that HashGAN can generate high-quality binary hash codes and yield state-of-the-art image retrieval performance on three benchmarks, NUS-WIDE, CIFAR-10, and MS-COCO. |
2018 | Hashing With Binary Matrix Pursuit | Cakir Fatih, He Kun, Sclaroff Stan | Arxiv | We propose theoretical and empirical improvements for two-stage hashing methods. We first provide a theoretical analysis on the quality of the binary codes and show that under mild assumptions a residual learning scheme can construct binary codes that fit any neighborhood structure with arbitrary accuracy. Secondly we show that with high-capacity hash functions such as CNNs binary code inference can be greatly simplified for many standard neighborhood definitions yielding smaller optimization problems and more robust codes. Incorporating our findings we propose a novel two-stage hashing method that significantly outperforms previous hashing studies on widely used image retrieval benchmarks. |
2018 | Zero-shot Sketch-image Hashing | Shen Yuming, Liu Li, Shen Fumin, Shao Ling | Arxiv | Recent studies show that large-scale sketch-based image retrieval (SBIR) can be efficiently tackled by cross-modal binary representation learning methods where Hamming distance matching significantly speeds up the process of similarity search. Providing training and test data subjected to a fixed set of pre-defined categories the cutting-edge SBIR and cross-modal hashing works obtain acceptable retrieval performance. However most of the existing methods fail when the categories of query sketches have never been seen during training. In this paper the above problem is briefed as a novel but realistic zero-shot SBIR hashing task. We elaborate the challenges of this special task and accordingly propose a zero-shot sketch-image hashing (ZSIH) model. An end-to-end three-network architecture is built two of which are treated as the binary encoders. The third network mitigates the sketch-image heterogeneity and enhances the semantic relations among data by utilizing the Kronecker fusion layer and graph convolution respectively. As an important part of ZSIH we formulate a generative hashing scheme in reconstructing semantic knowledge representations for zero-shot retrieval. To the best of our knowledge ZSIH is the first zero-shot hashing work suitable for SBIR and cross-modal search. Comprehensive experiments are conducted on two extended datasets i.e. Sketchy and TU-Berlin with a novel zero-shot train-test split. The proposed model remarkably outperforms related works. |
2018 | Texture Synthesis Guided Deep Hashing For Texture Image Retrieval | Bhunia Ayan Kumar, Kishore Perla Sai Raj, Mukherjee Pranay, Das Abhirup, Roy Partha Pratim | Arxiv | With the large-scale explosion of images and videos over the internet efficient hashing methods have been developed to facilitate memory and time efficient retrieval of similar images. However none of the existing works uses hashing to address texture image retrieval mostly because of the lack of sufficiently large texture image databases. Our work addresses this problem by developing a novel deep learning architecture that generates binary hash codes for input texture images. For this we first pre-train a Texture Synthesis Network (TSN) which takes a texture patch as input and outputs an enlarged view of the texture by injecting newer texture content. Thus it signifies that the TSN encodes the learnt texture specific information in its intermediate layers. In the next stage a second network gathers the multi-scale feature representations from the TSNs intermediate layers using channel-wise attention combines them in a progressive manner to a dense continuous representation which is finally converted into a binary hash code with the help of individual and pairwise label information. The new enlarged texture patches also help in data augmentation to alleviate the problem of insufficient texture data and are used to train the second stage of the network. Experiments on three public texture image retrieval datasets indicate the superiority of our texture synthesis guided hashing approach over current state-of-the-art methods. |
2018 | Adding Cues To Binary Feature Descriptors For Visual Place Recognition | Schlegel Dominik, Grisetti Giorgio | Arxiv | In this paper we propose an approach to embed continuous and selector cues in binary feature descriptors used for visual place recognition. The embedding is achieved by extending each feature descriptor with a binary string that encodes a cue and supports the Hamming distance metric. Augmenting the descriptors in such a way has the advantage of being transparent to the procedure used to compare them. We present two concrete applications of our methodology demonstrating the two considered types of cues. In addition to that we conducted on these applications a broad quantitative and comparative evaluation covering five benchmark datasets and several state-of-the-art image retrieval approaches in combination with various binary descriptor types. |
2018 | Fully Understanding The Hashing Trick | Casper B. Freksen, Lior Kamma, Kasper Green Larsen | Neural Information Processing Systems | Feature hashing also known as (em) the hashing trick introduced by Weinberger et al. (2009) is one of the key techniques used in scaling-up machine learning algorithms. Loosely speaking feature hashing uses a random sparse projection matrix A (mathbbR)^n (to) (mathbbR)^m (where m (ll) n) in order to reduce the dimension of the data from n to m while approximately preserving the Euclidean norm. Every column of A contains exactly one non-zero entry equals to either -1 or 1. Weinberger et al. showed tail bounds on Ax_2^2. Specifically they showed that for every (varepsilon) (delta) if x_(infty) / x_2 is sufficiently small and m is sufficiently large then (begin)equation(Pr) ; ;Ax_2^2 - x_2^2; < (varepsilon) x_2^2 ; (ge) 1 - (delta) ;.(end)equation These bounds were later extended by Dasgupta et al. (2010) and most recently refined by Dahlgaard et al. (2017) however the true nature of the performance of this key technique and specifically the correct tradeoff between the pivotal parameters x_(infty) / x_2 m (varepsilon) (delta) remained an open question. We settle this question by giving tight asymptotic bounds on the exact tradeoff between the central parameters thus providing a complete understanding of the performance of feature hashing. We complement the asymptotic bound with empirical data which shows that the constants hiding in the asymptotic notation are in fact very close to 1 thus further illustrating the tightness of the presented bounds in practice. |
2018 | Fuzzy Hashing As Perturbation-consistent Adversarial Kernel Embedding | Azarafrooz Ari, Brock John | Arxiv | Measuring the similarity of two files is an important task in malware analysis with fuzzy hash functions being a popular approach. Traditional fuzzy hash functions are data agnostic they do not learn from a particular dataset how to determine similarity; their behavior is fixed across all datasets. In this paper we demonstrate that fuzzy hash functions can be learned in a novel minimax training framework and that these learned fuzzy hash functions outperform traditional fuzzy hash functions at the file similarity task for Portable Executable files. In our approach hash digests can be extracted from the kernel embeddings of two kernel networks trained in a minimax framework where the roles of players during training (i.e adversary versus generator) alternate along with the input data. We refer to this new minimax architecture as perturbation-consistent. The similarity score for a pair of files is the utility of the minimax game in equilibrium. Our experiments show that learned fuzzy hash functions generalize well capable of determining that two files are similar even when one of those files was generated using insertion and deletion operations. |
2018 | FRESH Frechet Similarity With Hashing | Ceccarello Matteo, Driemel Anne, Silvestri Francesco | Proc. of Algorithms and Data Structures Symposium | This paper studies the r-range search problem for curves under the continuous Frechet distance given a dataset S of n polygonal curves and a threshold r0 construct a data structure that for any query curve q efficiently returns all entries in S with distance at most r from q. We propose FRESH an approximate and randomized approach for r-range search that leverages on a locality sensitive hashing scheme for detecting candidate near neighbors of the query curve and on a subsequent pruning step based on a cascade of curve simplifications. We experimentally compare (fresh) to exact and deterministic solutions and we show that high performance can be reached by suitably relaxing precision and recall. |
2018 | Non-empty Bins With Simple Tabulation Hashing | Aamand Anders, Thorup Mikkel | Arxiv | We consider the hashing of a set X(subseteq) U with X=m using a simple tabulation hash function hU(to) n=0(dots)n-1 and analyse the number of non-empty bins that is the size of h(X). We show that the expected size of h(X) matches that with fully random hashing to within low-order terms. We also provide concentration bounds. The number of non-empty bins is a fundamental measure in the balls and bins paradigm and it is critical in applications such as Bloom filters and Filter hashing. For example normally Bloom filters are proportioned for a desired low false-positive probability assuming fully random hashing (see (url)en.wikipedia.org/wiki/Bloom_filter). Our results imply that if we implement the hashing with simple tabulation we obtain the same low false-positive probability for any possible input. |
2018 | H-CNN Spatial Hashing Based CNN For 3D Shape Analysis | Shao Tianjia, Yang Yin, Weng Yanlin, Hou Qiming, Zhou Kun | Arxiv | We present a novel spatial hashing based data structure to facilitate 3D shape analysis using convolutional neural networks (CNNs). Our method well utilizes the sparse occupancy of 3D shape boundary and builds hierarchical hash tables for an input model under different resolutions. Based on this data structure we design two efficient GPU algorithms namely hash2col and col2hash so that the CNN operations like convolution and pooling can be efficiently parallelized. The spatial hashing is nearly minimal and our data structure is almost of the same size as the raw input. Compared with state-of-the-art octree-based methods our data structure significantly reduces the memory footprint during the CNN training. As the input geometry features are more compactly packed CNN operations also run faster with our data structure. The experiment shows that under the same network structure our method yields comparable or better benchmarks compared to the state-of-the-art while it has only one-third memory consumption. Such superior memory performance allows the CNN to handle high-resolution shape analysis. |
2018 | GPU Accelerated Cascade Hashing Image Matching For Large Scale 3D Reconstruction | Xu Tao, Sun Kun, Tao Wenbing | Arxiv | Image feature point matching is a key step in Structure from Motion(SFM). However it is becoming more and more time consuming because the number of images is getting larger and larger. In this paper we proposed a GPU accelerated image matching method with improved Cascade Hashing. Firstly we propose a Disk-Memory-GPU data exchange strategy and optimize the load order of data so that the proposed method can deal with big data. Next we parallelize the Cascade Hashing method on GPU. An improved parallel reduction and an improved parallel hashing ranking are proposed to fulfill this task. Finally extensive experiments show that our image matching is about 20 times faster than SiftGPU on the same graphics card nearly 100 times faster than the CPU CasHash method and hundreds of times faster than the CPU Kd-Tree based matching method. Further more we introduce the epipolar constraint to the proposed method and use the epipolar geometry to guide the feature matching procedure which further reduces the matching cost. |
2018 | Discriminative Supervised Hashing For Cross-modal Similarity Search | Yu Jun, Wu Xiao-jun, Kittler Josef | Arxiv | With the advantage of low storage cost and high retrieval efficiency hashing techniques have recently been an emerging topic in cross-modal similarity search. As multiple modal data reflect similar semantic content many researches aim at learning unified binary codes. However discriminative hashing features learned by these methods are not adequate. This results in lower accuracy and robustness. We propose a novel hashing learning framework which jointly performs classifier learning subspace learning and matrix factorization to preserve class-specific semantic content termed Discriminative Supervised Hashing (DSH) to learn the discrimative unified binary codes for multi-modal data. Besides reducing the loss of information and preserving the non-linear structure of data DSH non-linearly projects different modalities into the common space in which the similarity among heterogeneous data points can be measured. Extensive experiments conducted on the three publicly available datasets demonstrate that the framework proposed in this paper outperforms several state-of -the-art methods. |
2018 | SCRATCH: A Scalable Discrete Matrix Factorization Hashing for Cross-Modal Retrieval | Chuan-Xiang Li , Zhen-Duo Chen, Peng-Fei Zhang, Xin Luo, Liqiang Nie, Wei Zhang, Xin-Shun Xu | MM | In recent years, many hashing methods have been proposed for the cross-modal retrieval task. However, there are still some issues that need to be further explored. For example, some of them relax the binary constraints to generate the hash codes, which may generate large quantization error. Although some discrete schemes have been proposed, most of them are time-consuming. In addition, most of the existing supervised hashing methods use an n x n similarity matrix during the optimization, making them unscalable. To address these issues, in this paper, we present a novel supervised cross-modal hashing method—Scalable disCRete mATrix faCtorization Hashing, SCRATCH for short. It leverages the collective matrix factorization on the kernelized features and the semantic embedding with labels to find a latent semantic space to preserve the intra- and inter-modality similarities. In addition, it incorporates the label matrix instead of the similarity matrix into the loss function. Based on the proposed loss function and the iterative optimization algorithm, it can learn the hash functions and binary codes simultaneously. Moreover, the binary codes can be generated discretely, reducing the quantization error generated by the relaxation scheme. Its time complexity is linear to the size of the dataset, making it scalable to large-scale datasets. Extensive experiments on three benchmark datasets, namely, Wiki, MIRFlickr-25K, and NUS-WIDE, have verified that our proposed SCRATCH model outperforms several state-of-the-art unsupervised and supervised hashing methods for cross-modal retrieval. |
2018 | Dual Asymmetric Deep Hashing Learning | Li Jinxing, Zhang Bob, Lu Guangming, Zhang David | Arxiv | Due to the impressive learning power deep learning has achieved a remarkable performance in supervised hash function learning. In this paper we propose a novel asymmetric supervised deep hashing method to preserve the semantic structure among different categories and generate the binary codes simultaneously. Specifically two asymmetric deep networks are constructed to reveal the similarity between each pair of images according to their semantic labels. The deep hash functions are then learned through two networks by minimizing the gap between the learned features and discrete codes. Furthermore since the binary codes in the Hamming space also should keep the semantic affinity existing in the original space another asymmetric pairwise loss is introduced to capture the similarity between the binary codes and real-value features. This asymmetric loss not only improves the retrieval performance but also contributes to a quick convergence at the training phase. By taking advantage of the two-stream deep structures and two types of asymmetric pairwise functions an alternating algorithm is designed to optimize the deep features and high-quality binary codes efficiently. Experimental results on three real-world datasets substantiate the effectiveness and superiority of our approach as compared with state-of-the-art. |
2018 | Self-supervised Adversarial Hashing Networks For Cross-modal Retrieval | Li Chao, Deng Cheng, Li Ning, Liu Wei, Gao Xinbo, Tao Dacheng | Arxiv | Thanks to the success of deep learning cross-modal retrieval has made significant progress recently. However there still remains a crucial bottleneck how to bridge the modality gap to further enhance the retrieval accuracy. In this paper we propose a self-supervised adversarial hashing ((textbfSSAH)) approach which lies among the early attempts to incorporate adversarial learning into cross-modal hashing in a self-supervised fashion. The primary contribution of this work is that two adversarial networks are leveraged to maximize the semantic correlation and consistency of the representations between different modalities. In addition we harness a self-supervised semantic network to discover high-level semantic information in the form of multi-label annotations. Such information guides the feature learning process and preserves the modality relationships in both the common semantic space and the Hamming space. Extensive experiments carried out on three benchmark datasets validate that the proposed SSAH surpasses the state-of-the-art methods. |
2018 | Data-parallel Hashing Techniques For GPU Architectures | Lessley Brenton | Arxiv | Hash tables are one of the most fundamental data structures for effectively storing and accessing sparse data with widespread usage in domains ranging from computer graphics to machine learning. This study surveys the state-of-the-art research on data-parallel hashing techniques for emerging massively-parallel many-core GPU architectures. Key factors affecting the performance of different hashing schemes are discovered and used to suggest best practices and pinpoint areas for further research. |
2018 | Anchorhash A Scalable Consistent Hash | Mendelson Gal, Vargaftik Shay, Barabash Katherine, Lorenz Dean, Keslassy Isaac, Orda Ariel | Arxiv | Consistent hashing (CH) is a central building block in many networking applications from datacenter load-balancing to distributed storage. Unfortunately state-of-the-art CH solutions cannot ensure full consistency under arbitrary changes and/or cannot scale while maintaining reasonable memory footprints and update times. We present AnchorHash a scalable and fully-consistent hashing algorithm. AnchorHash achieves high key lookup rates a low memory footprint and low update times. We formally establish its strong theoretical guarantees and present advanced implementations with a memory footprint of only a few bytes per resource. Moreover extensive evaluations indicate that it outperforms state-of-the-art algorithms and that it can scale on a single core to 100 million resources while still achieving a key lookup rate of more than 15 million keys per second. |
2018 | Multichannel Distributed Local Pattern For Content Based Indexing And Retrieval | Mathur Sonakshi, Chaudhary Mallika, Verma Hemant, Mandal Murari, Vipparthi S. K., Murala Subrahmanyam | Arxiv | A novel color feature descriptor Multichannel Distributed Local Pattern (MDLP) is proposed in this manuscript. The MDLP combines the salient features of both local binary and local mesh patterns in the neighborhood. The multi-distance information computed by the MDLP aids in robust extraction of the texture arrangement. Further MDLP features are extracted for each color channel of an image. The retrieval performance of the MDLP is evaluated on the three benchmark datasets for CBIR namely Corel-5000 Corel-10000 and MIT-Color Vistex respectively. The proposed technique attains substantial improvement as compared to other state-of- the-art feature descriptors in terms of various evaluation parameters such as ARP and ARR on the respective databases. |
2018 | Fast Scalable Supervised Hashing | Xin Luo, Liqiang Nie, Xiangnan He, Ye Wu, Zhen-Duo Chen, Xin-Shun Xu | SIGIR | Despite significant progress in supervised hashing, there are three common limitations of existing methods. First, most pioneer methods discretely learn hash codes bit by bit, making the learning procedure rather time-consuming. Second, to reduce the large complexity of the n by n pairwise similarity matrix, most methods apply sampling strategies during training, which inevitably results in information loss and suboptimal performance; some recent methods try to replace the large matrix with a smaller one, but the size is still large. Third, among the methods that leverage the pairwise similarity matrix, most of them only encode the semantic label information in learning the hash codes, failing to fully capture the characteristics of data. In this paper, we present a novel supervised hashing method, called Fast Scalable Supervised Hashing (FSSH), which circumvents the use of the large similarity matrix by introducing a pre-computed intermediate term whose size is independent with the size of training data. Moreover, FSSH can learn the hash codes with not only the semantic information but also the features of data. Extensive experiments on three widely used datasets demonstrate its superiority over several state-of-the-art methods in both accuracy and scalability. Our experiment codes are available at: https://lcbwlx.wixsite.com/fssh. |
2018 | A Filter Of Minhash For Image Similarity Measures | Long Jun, Liu Qunfeng, Yuan Xinpan, Zhang Chengyuan, Liu Junfeng | Arxiv | Image similarity measures play an important role in nearest neighbor search and duplicate detection for large-scale image datasets. Recently Minwise Hashing (or Minhash) and its related hashing algorithms have achieved great performances in large-scale image retrieval systems. However there are a large number of comparisons for image pairs in these applications which may spend a lot of computation time and affect the performance. In order to quickly obtain the pairwise images that theirs similarities are higher than the specific threshold T (e.g. 0.5) we propose a dynamic threshold filter of Minwise Hashing for image similarity measures. It greatly reduces the calculation time by terminating the unnecessary comparisons in advance. We also find that the filter can be extended to other hashing algorithms on when the estimator satisfies the binomial distribution such as b-Bit Minwise Hashing One Permutation Hashing etc. In this pager we use the Bag-of-Visual-Words (BoVW) model based on the Scale Invariant Feature Transform (SIFT) to represent the image features. We have proved that the filter is correct and effective through the experiment on real image datasets. |
2018 | Convolutional Hashing For Automated Scene Matching | Loncaric Martin, Liu Bowei, Weber Ryan | Arxiv | We present a powerful new loss function and training scheme for learning binary hash functions. In particular we demonstrate our method by creating for the first time a neural network that outperforms state-of-the-art Haar wavelets and color layout descriptors at the task of automated scene matching. By accurately relating distance on the manifold of network outputs to distance in Hamming space we achieve a 100-fold reduction in nontrivial false positive rate and significantly higher true positive rate. We expect our insights to provide large wins for hashing models applied to other information retrieval hashing tasks as well. |
2018 | Learning Hash Codes Via Hamming Distance Targets | Loncaric Martin, Liu Bowei, Weber Ryan | Arxiv | We present a powerful new loss function and training scheme for learning binary hash codes with any differentiable model and similarity function. Our loss function improves over prior methods by using log likelihood loss on top of an accurate approximation for the probability that two inputs fall within a Hamming distance target. Our novel training scheme obtains a good estimate of the true gradient by better sampling inputs and evaluating loss terms between all pairs of inputs in each minibatch. To fully leverage the resulting hashes we use multi-indexing. We demonstrate that these techniques provide large improvements to a similarity search tasks. We report the best results to date on competitive information retrieval tasks for ImageNet and SIFT 1M improving MAP from 7337; to 8437; and reducing query cost by a factor of 2-8 respectively. |
2018 | Collaborative Learning For Extremely Low Bit Asymmetric Hashing | Luo Yadan, Huang Zi, Li Yang, Shen Fumin, Yang Yang, Cui Peng | Arxiv | Hashing techniques are in great demand for a wide range of real-world applications such as image retrieval and network compression. Nevertheless existing approaches could hardly guarantee a satisfactory performance with the extremely low-bit (e.g. 4-bit) hash codes due to the severe information loss and the shrink of the discrete solution space. In this paper we propose a novel (textit)Collaborative Learning strategy that is tailored for generating high-quality low-bit hash codes. The core idea is to jointly distill bit-specific and informative representations for a group of pre-defined code lengths. The learning of short hash codes among the group can benefit from the manifold shared with other long codes where multiple views from different hash codes provide the supplementary guidance and regularization making the convergence faster and more stable. To achieve that an asymmetric hashing framework with two variants of multi-head embedding structures is derived termed as Multi-head Asymmetric Hashing (MAH) leading to great efficiency of training and querying. Extensive experiments on three benchmark datasets have been conducted to verify the superiority of the proposed MAH and have shown that the 8-bit hash codes generated by MAH achieve 94.337; of the MAP (Mean Average Precision (MAP)) score on the CIFAR-10 dataset which significantly surpasses the performance of the 48-bit codes by the state-of-the-arts in image retrieval tasks. |
2018 | Semantic Cluster Unary Loss For Efficient Deep Hashing | Zhang Shifeng, Li Jianmin, Zhang Bo | IEEE Transactions on Image Processing | Hashing method maps similar data to binary hashcodes with smaller hamming distance which has received a broad attention due to its low storage cost and fast retrieval speed. With the rapid development of deep learning deep hashing methods have achieved promising results in efficient information retrieval. Most of the existing deep hashing methods adopt pairwise or triplet losses to deal with similarities underlying the data but the training is difficult and less efficient because O(n^2) data pairs and O(n^3) triplets are involved. To address these issues we propose a novel deep hashing algorithm with unary loss which can be trained very efficiently. We first of all introduce a Unary Upper Bound of the traditional triplet loss thus reducing the complexity to O(n) and bridging the classification-based unary loss and the triplet loss. Second we propose a novel Semantic Cluster Deep Hashing (SCDH) algorithm by introducing a modified Unary Upper Bound loss named Semantic Cluster Unary Loss (SCUL). The resultant hashcodes form several compact clusters which means hashcodes in the same cluster have similar semantic information. We also demonstrate that the proposed SCDH is easy to be extended to semi-supervised settings by incorporating the state-of-the-art semi-supervised learning algorithms. Experiments on large-scale datasets show that the proposed method is superior to state-of-the-art hashing algorithms. |
2018 | Improved Deep Hashing With Soft Pairwise Similarity For Multi-label Image Retrieval | Zhang Zheng, Zou Qin, Lin Yuewei, Chen Long, Wang Song | Arxiv | Hash coding has been widely used in the approximate nearest neighbor search for large-scale image retrieval. Recently many deep hashing methods have been proposed and shown largely improved performance over traditional feature-learning-based methods. Most of these methods examine the pairwise similarity on the semantic-level labels where the pairwise similarity is generally defined in a hard-assignment way. That is the pairwise similarity is 1 if they share no less than one class label and 0 if they do not share any. However such similarity definition cannot reflect the similarity ranking for pairwise images that hold multiple labels. In this paper a new deep hashing method is proposed for multi-label image retrieval by re-defining the pairwise similarity into an instance similarity where the instance similarity is quantified into a percentage based on the normalized semantic labels. Based on the instance similarity a weighted cross-entropy loss and a minimum mean square error loss are tailored for loss-function construction and are efficiently used for simultaneous feature learning and hash coding. Experiments on three popular datasets demonstrate that the proposed method outperforms the competing methods and achieves the state-of-the-art performance in multi-label image retrieval. |
2018 | Deep Domain Adaptation Hashing with Adversarial Learning | Fuchen Long, Ting Yao, Qi Dai, Xinmei Tian, Jiebo Luo, Tao Mei | SIGIR | The recent advances in deep neural networks have demonstrated high capability in a wide variety of scenarios. Nevertheless, fine-tuning deep models in a new domain still requires a significant amount of labeled data despite expensive labeling efforts. A valid question is how to leverage the source knowledge plus unlabeled or only sparsely labeled target data for learning a new model in target domain. The core problem is to bring the source and target distributions closer in the feature space. In the paper, we facilitate this issue in an adversarial learning framework, in which a domain discriminator is devised to handle domain shift. Particularly, we explore the learning in the context of hashing problem, which has been studied extensively due to its great efficiency in gigantic data. Specifically, a novel Deep Domain Adaptation Hashing with Adversarial learning (DeDAHA) architecture is presented, which mainly consists of three components: a deep convolutional neural networks (CNN) for learning basic image/frame representation followed by an adversary stream on one hand to optimize the domain discriminator, and on the other, to interact with each domain-specific hashing stream for encoding image representation to hash codes. The whole architecture is trained end-to-end by jointly optimizing two types of losses, i.e., triplet ranking loss to preserve the relative similarity ordering in the input triplets and adversarial loss to maximally fool the domain discriminator with the learnt source and target feature distributions. Extensive experiments are conducted on three domain transfer tasks, including cross-domain digits retrieval, image to image and image to video transfers, on several benchmarks. Our DeDAHA framework achieves superior results when compared to the state-of-the-art techniques. |
2018 | Semi-supervised Hashing For Semi-paired Cross-view Retrieval | Yu Jun, Wu Xiao-jun, Kittler Josef | Arxiv | Recently hashing techniques have gained importance in large-scale retrieval tasks because of their retrieval speed. Most of the existing cross-view frameworks assume that data are well paired. However the fully-paired multiview situation is not universal in real applications. The aim of the method proposed in this paper is to learn the hashing function for semi-paired cross-view retrieval tasks. To utilize the label information of partial data we propose a semi-supervised hashing learning framework which jointly performs feature extraction and classifier learning. The experimental results on two datasets show that our method outperforms several state-of-the-art methods in terms of retrieval accuracy. |
2018 | Fusion Hashing A General Framework For Self-improvement Of Hashing | Liu Xingbo, Nie Xiushan, Yin Yilong | Arxiv | Hashing has been widely used for efficient similarity search based on its query and storage efficiency. To obtain better precision most studies focus on designing different objective functions with different constraints or penalty terms that consider neighborhood information. In this paper in contrast to existing hashing methods we propose a novel generalized framework called fusion hashing (FH) to improve the precision of existing hashing methods without adding new constraints or penalty terms. In the proposed FH given an existing hashing method we first execute it several times to get several different hash codes for a set of training samples. We then propose two novel fusion strategies that combine these different hash codes into one set of final hash codes. Based on the final hash codes we learn a simple linear hash function for the samples that can significantly improve model precision. In general the proposed FH can be adopted in existing hashing method and achieve more precise and stable performance compared to the original hashing method with little extra expenditure in terms of time and space. Extensive experiments were performed based on three benchmark datasets and the results demonstrate the superior performance of the proposed framework |
2018 | Discriminative Cross-view Binary Representation Learning | Liu Liu, Qi Hairong | WACV | Learning compact representation is vital and challenging for large scale multimedia data. Cross-view/cross-modal hashing for effective binary representation learning has received significant attention with exponentially growing availability of multimedia content. Most existing cross-view hashing algorithms emphasize the similarities in individual views which are then connected via cross-view similarities. In this work we focus on the exploitation of the discriminative information from different views and propose an end-to-end method to learn semantic-preserving and discriminative binary representation dubbed Discriminative Cross-View Hashing (DCVH) in light of learning multitasking binary representation for various tasks including cross-view retrieval image-to-image retrieval and image annotation/tagging. The proposed DCVH has the following key components. First it uses convolutional neural network (CNN) based nonlinear hashing functions and multilabel classification for both images and texts simultaneously. Such hashing functions achieve effective continuous relaxation during training without explicit quantization loss by using Direct Binary Embedding (DBE) layers. Second we propose an effective view alignment via Hamming distance minimization which is efficiently accomplished by bit-wise XOR operation. Extensive experiments on two image-text benchmark datasets demonstrate that DCVH outperforms state-of-the-art cross-view hashing algorithms as well as single-view image hashing algorithms. In addition DCVH can provide competitive performance for image annotation/tagging. |
2018 | MTFH A Matrix Tri-factorization Hashing Framework For Efficient Cross-modal Retrieval | Liu Xin, Hu Zhikai, Ling Haibin, Cheung Yiu-ming | IEEE Transactions on Pattern Analysis and Machine Intelligence | Hashing has recently sparked a great revolution in cross-modal retrieval because of its low storage cost and high query speed. Recent cross-modal hashing methods often learn unified or equal-length hash codes to represent the multi-modal data and make them intuitively comparable. However such unified or equal-length hash representations could inherently sacrifice their representation scalability because the data from different modalities may not have one-to-one correspondence and could be encoded more efficiently by different hash codes of unequal lengths. To mitigate these problems this paper exploits a related and relatively unexplored problem encode the heterogeneous data with varying hash lengths and generalize the cross-modal retrieval in various challenging scenarios. To this end a generalized and flexible cross-modal hashing framework termed Matrix Tri-Factorization Hashing (MTFH) is proposed to work seamlessly in various settings including paired or unpaired multi-modal data and equal or varying hash length encoding scenarios. More specifically MTFH exploits an efficient objective function to flexibly learn the modality-specific hash codes with different length settings while synchronously learning two semantic correlation matrices to semantically correlate the different hash representations for heterogeneous data comparable. As a result the derived hash codes are more semantically meaningful for various challenging cross-modal retrieval tasks. Extensive experiments evaluated on public benchmark datasets highlight the superiority of MTFH under various retrieval scenarios and show its competitive performance with the state-of-the-arts. |
2018 | Learning Discriminative Hashing Codes For Cross-modal Retrieval Based On Multi-view Features | Yu Jun, Wu Xiao-jun, Kittler Josef | Arxiv | Hashing techniques have been applied broadly in retrieval tasks due to their low storage requirements and high speed of processing. Many hashing methods based on a single view have been extensively studied for information retrieval. However the representation capacity of a single view is insufficient and some discriminative information is not captured which results in limited improvement. In this paper we employ multiple views to represent images and texts for enriching the feature information. Our framework exploits the complementary information among multiple views to better learn the discriminative compact hash codes. A discrete hashing learning framework that jointly performs classifier learning and subspace learning is proposed to complete multiple search tasks simultaneously. Our framework includes two stages namely a kernelization process and a quantization process. Kernelization aims to find a common subspace where multi-view features can be fused. The quantization stage is designed to learn discriminative unified hashing codes. Extensive experiments are performed on single-label datasets (WiKi and MMED) and multi-label datasets (MIRFlickr and NUS-WIDE) and the experimental results indicate the superiority of our method compared with the state-of-the-art methods. |
2018 | Norm-ranging LSH For Maximum Inner Product Search | Xiao Yan, Jinfeng Li, Xinyan Dai, Hongzhi Chen, James Cheng | Neural Information Processing Systems | Neyshabur and Srebro proposed SIMPLE-LSH which is the state-of-the-art hashing based algorithm for maximum inner product search (MIPS). We found that the performance of SIMPLE-LSH in both theory and practice suffers from long tails in the 2-norm distribution of real datasets. We propose NORM-RANGING LSH which addresses the excessive normalization problem caused by long tails by partitioning a dataset into sub-datasets and building a hash index for each sub-dataset independently. We prove that NORM-RANGING LSH achieves lower query time complexity than SIMPLE-LSH under mild conditions. We also show that the idea of dataset partitioning can improve another hashing based MIPS algorithm. Experiments show that NORM-RANGING LSH probes much less items than SIMPLE-LSH at the same recall thus significantly benefiting MIPS based applications. |
2018 | Unsupervised Deep Hashing with Similarity-Adaptive and Discrete Optimization | Fumin Shen, Yan Xu, Li Liu, Yang Yang, Zi Huang, Heng Tao Shen | TPAMI | Recent vision and learning studies show that learning compact hash codes can facilitate massive data processing with significantly reduced storage and computation. Particularly, learning deep hash functions has greatly improved the retrieval performance, typically under the semantic supervision. In contrast, current unsupervised deep hashing algorithms can hardly achieve satisfactory performance due to either the relaxed optimization or absence of similarity-sensitive objective. In this work, we propose a simple yet effective unsupervised hashing framework, named Similarity-Adaptive Deep Hashing (SADH), which alternatingly proceeds over three training modules: deep hash model training, similarity graph updating and binary code optimization. The key difference from the widely-used two-step hashing method is that the output representations of the learned deep model help update the similarity graph matrix, which is then used to improve the subsequent code optimization. In addition, for producing high-quality binary codes, we devise an effective discrete optimization algorithm which can directly handle the binary constraints with a general hashing loss. Extensive experiments validate the efficacy of SADH, which consistently outperforms the state-of-the-arts by large gaps. |
2018 | A Scalable Optimization Mechanism For Pairwise Based Discrete Hashing | Shi Xiaoshuang, Xing Fuyong, Zhang Zizhao, Sapkota Manish, Guo Zhenhua, Yang Lin | Arxiv | Maintaining the pair similarity relationship among originally high-dimensional data into a low-dimensional binary space is a popular strategy to learn binary codes. One simiple and intutive method is to utilize two identical code matrices produced by hash functions to approximate a pairwise real label matrix. However the resulting quartic problem is difficult to directly solve due to the non-convex and non-smooth nature of the objective. In this paper unlike previous optimization methods using various relaxation strategies we aim to directly solve the original quartic problem using a novel alternative optimization mechanism to linearize the quartic problem by introducing a linear regression model. Additionally we find that gradually learning each batch of binary codes in a sequential mode i.e. batch by batch is greatly beneficial to the convergence of binary code learning. Based on this significant discovery and the proposed strategy we introduce a scalable symmetric discrete hashing algorithm that gradually and smoothly updates each batch of binary codes. To further improve the smoothness we also propose a greedy symmetric discrete hashing algorithm to update each bit of batch binary codes. Moreover we extend the proposed optimization mechanism to solve the non-convex optimization problems for binary code learning in many other pairwise based hashing algorithms. Extensive experiments on benchmark single-label and multi-label databases demonstrate the superior performance of the proposed mechanism over recent state-of-the-art methods. |
2018 | Efficient Nearest Neighbors Search For Large-scale Landmark Recognition | Magliani Federico, Fontanini Tomaso, Prati Andrea | Arxiv | The problem of landmark recognition has achieved excellent results in small-scale datasets. When dealing with large-scale retrieval issues that were irrelevant with small amount of data quickly become fundamental for an efficient retrieval phase. In particular computational time needs to be kept as low as possible whilst the retrieval accuracy has to be preserved as much as possible. In this paper we propose a novel multi-index hashing method called Bag of Indexes (BoI) for Approximate Nearest Neighbors (ANN) search. It allows to drastically reduce the query time and outperforms the accuracy results compared to the state-of-the-art methods for large-scale landmark recognition. It has been demonstrated that this family of algorithms can be applied on different embedding techniques like VLAD and R-MAC obtaining excellent results in very short times on different public datasets Holidays+Flickr1M Oxford105k and Paris106k. |
2018 | Object Detection Based Deep Unsupervised Hashing | Tu Rong-cheng, Mao Xian-ling, Feng Bo-si, Bian Bing-bing, Ying Yu-shu | Arxiv | Recently similarity-preserving hashing methods have been extensively studied for large-scale image retrieval. Compared with unsupervised hashing supervised hashing methods for labeled data have usually better performance by utilizing semantic label information. Intuitively for unlabeled data it will improve the performance of unsupervised hashing methods if we can first mine some supervised semantic label information from unlabeled data and then incorporate the label information into the training process. Thus in this paper we propose a novel Object Detection based Deep Unsupervised Hashing method (ODDUH). Specifically a pre-trained object detection model is utilized to mining supervised label information which is used to guide the learning process to generate high-quality hash codes.Extensive experiments on two public datasets demonstrate that the proposed method outperforms the state-of-the-art unsupervised hashing methods in the image retrieval task. |
2018 | Progressive Generative Hashing for Image Retrieval | Yuqing Ma, Yue He, Fan Ding, Sheng Hu, Jun Li, Xianglong Liu | IJCAI | Recent years have witnessed the success of the emerging hashing techniques in large-scale image retrieval. Owing to the great learning capacity, deep hashing has become one of the most promising solutions, and achieved attractive performance in practice. However, without semantic label information, the unsupervised deep hashing still remains an open question. In this paper, we propose a novel progressive generative hashing (PGH) framework to help learn a discriminative hashing network in an unsupervised way. Different from existing studies, it first treats the hash codes as a kind of semantic condition for the similar image generation, and simultaneously feeds the original image and its codes into the generative adversarial networks (GANs). The real images together with the synthetic ones can further help train a discriminative hashing network based on a triplet loss. By iteratively inputting the learnt codes into the hash conditioned GANs, we can progressively enable the hashing network to discover the semantic relations. Extensive experiments on the widely-used image datasets demonstrate that PGH can significantly outperform stateof-the-art unsupervised hashing methods. |
2017 | On Fast Bounded Locality Sensitive Hashing | Wygocki Piotr | Arxiv | In this paper we examine the hash functions expressed as scalar products i.e. f(x)=<vx for some bounded random vector v. Such hash functions have numerous applications but often there is a need to optimize the choice of the distribution of v. In the present work we focus on so-called anti-concentration bounds i.e. the upper bounds of (mathbbP)(left)<vx < (alpha) (right). In many applications v is a vector of independent random variables with standard normal distribution. In such case the distribution of <vx is also normal and it is easy to approximate (mathbbP)(left)<vx < (alpha) (right). Here we consider two bounded distributions in the context of the anti-concentration bounds. Particularly we analyze v being a random vector from the unit ball in l_(infty) and v being a random vector from the unit sphere in l_2. We show optimal up to a constant anti-concentration measures for functions f(x)=<vx. As a consequence of our research we obtain new best results for (newline) (textit)c-approximate nearest neighbors without false negatives for l_p in high dimensional space for all p(in)1(infty) for c=(Omega)((max)(sqrtd)d^1/p). These results improve over those presented in 16. Finally our paper reports progress on answering the open problem by Pagh~17 who considered the nearest neighbor search without false negatives for the Hamming distance. |
2017 | Fast And Scalable Minimal Perfect Hashing For Massive Key Sets | Limasset Antoine, Rizk Guillaume, Chikhi Rayan, Peterlongo Pierre | Arxiv | Minimal perfect hash functions provide space-efficient and collision-free hashing on static sets. Existing algorithms and implementations that build such functions have practical limitations on the number of input elements they can process due to high construction time RAM or external memory usage. We revisit a simple algorithm and show that it is highly competitive with the state of the art especially in terms of construction time and memory usage. We provide a parallel C++ implementation called BBhash. It is capable of creating a minimal perfect hash function of 10^10 elements in less than 7 minutes using 8 threads and 5 GB of memory and the resulting function uses 3.7 bits/element. To the best of our knowledge this is also the first implementation that has been successfully tested on an input of cardinality 10^12. Source code https://github.com/rizkg/BBHash” |
2017 | Deep Discrete Supervised Hashing | Jiang Qing-yuan, Cui Xue, Li Wu-jun | Arxiv | Hashing has been widely used for large-scale search due to its low storage cost and fast query speed. By using supervised information supervised hashing can significantly outperform unsupervised hashing. Recently discrete supervised hashing and deep hashing are two representative progresses in supervised hashing. On one hand hashing is essentially a discrete optimization problem. Hence utilizing supervised information to directly guide discrete (binary) coding procedure can avoid sub-optimal solution and improve the accuracy. On the other hand deep hashing which integrates deep feature learning and hash-code learning into an end-to-end architecture can enhance the feedback between feature learning and hash-code learning. The key in discrete supervised hashing is to adopt supervised information to directly guide the discrete coding procedure in hashing. The key in deep hashing is to adopt the supervised information to directly guide the deep feature learning procedure. However there have not existed works which can use the supervised information to directly guide both discrete coding procedure and deep feature learning procedure in the same framework. In this paper we propose a novel deep hashing method called deep discrete supervised hashing (DDSH) to address this problem. DDSH is the first deep hashing method which can utilize supervised information to directly guide both discrete coding procedure and deep feature learning procedure and thus enhance the feedback between these two important procedures. Experiments on three real datasets show that DDSH can outperform other state-of-the-art baselines including both discrete hashing and deep hashing baselines for image retrieval. |
2017 | Asymmetric Deep Supervised Hashing | Jiang Qing-yuan, Li Wu-jun | Arxiv | Hashing has been widely used for large-scale approximate nearest neighbor search because of its storage and search efficiency. Recent work has found that deep supervised hashing can significantly outperform non-deep supervised hashing in many applications. However most existing deep supervised hashing methods adopt a symmetric strategy to learn one deep hash function for both query points and database (retrieval) points. The training of these symmetric deep supervised hashing methods is typically time-consuming which makes them hard to effectively utilize the supervised information for cases with large-scale database. In this paper we propose a novel deep supervised hashing method called asymmetric deep supervised hashing (ADSH) for large-scale nearest neighbor search. ADSH treats the query points and database points in an asymmetric way. More specifically ADSH learns a deep hash function only for query points while the hash codes for database points are directly learned. The training of ADSH is much more efficient than that of traditional symmetric deep supervised hashing methods. Experiments show that ADSH can achieve state-of-the-art performance in real applications. |
2017 | Discrete Latent Factor Model For Cross-modal Hashing | Jiang Qing-yuan, Li Wu-jun | Arxiv | Due to its storage and retrieval efficiency cross-modal hashing~(CMH) has been widely used for cross-modal similarity search in multimedia applications. According to the training strategy existing CMH methods can be mainly divided into two categories relaxation-based continuous methods and discrete methods. In general the training of relaxation-based continuous methods is faster than discrete methods but the accuracy of relaxation-based continuous methods is not satisfactory. On the contrary the accuracy of discrete methods is typically better than relaxation-based continuous methods but the training of discrete methods is time-consuming. In this paper we propose a novel CMH method called discrete latent factor model based cross-modal hashing~(DLFH) for cross modal similarity search. DLFH is a discrete method which can directly learn the binary hash codes for CMH. At the same time the training of DLFH is efficient. Experiments on real datasets show that DLFH can achieve significantly better accuracy than existing methods and the training time of DLFH is comparable to that of relaxation-based continuous methods which are much faster than existing discrete methods. |
2017 | Part-based Deep Hashing For Large-scale Person Re-identification | Zhu Fuqing, Kong Xiangwei, Zheng Liang, Fu Haiyan, Tian Qi | Arxiv | Large-scale is a trend in person re-identification (re-id). It is important that real-time search be performed in a large gallery. While previous methods mostly focus on discriminative learning this paper makes the attempt in integrating deep learning and hashing into one framework to evaluate the efficiency and accuracy for large-scale person re-id. We integrate spatial information for discriminative visual representation by partitioning the pedestrian image into horizontal parts. Specifically Part-based Deep Hashing (PDH) is proposed in which batches of triplet samples are employed as the input of the deep hashing architecture. Each triplet sample contains two pedestrian images (or parts) with the same identity and one pedestrian image (or part) of the different identity. A triplet loss function is employed with a constraint that the Hamming distance of pedestrian images (or parts) with the same identity is smaller than ones with the different identity. In the experiment we show that the proposed Part-based Deep Hashing method yields very competitive re-id accuracy on the large-scale Market-1501 and Market-1501+500K datasets. |
2017 | Set-to-set Hashing With Applications In Visual Recognition | Jhuo I-hong, Wang Jun | Arxiv | Visual data such as an image or a sequence of video frames is often naturally represented as a point set. In this paper we consider the fundamental problem of finding a nearest set from a collection of sets to a query set. This problem has obvious applications in large-scale visual retrieval and recognition and also in applied fields beyond computer vision. One challenge stands out in solving the problem—set representation and measure of similarity. Particularly the query set and the sets in dataset collection can have varying cardinalities. The training collection is large enough such that linear scan is impractical. We propose a simple representation scheme that encodes both statistical and structural information of the sets. The derived representations are integrated in a kernel framework for flexible similarity measurement. For the query set process we adopt a learning-to-hash pipeline that turns the kernel representations into hash bits based on simple learners using multiple kernel learning. Experiments on two visual retrieval datasets show unambiguously that our set-to-set hashing framework outperforms prior methods that do not take the set-to-set search setting. |
2017 | Learning A Complete Image Indexing Pipeline | Jain Himalaya, Zepeda Joaquin, Pérez Patrick, Gribonval Rémi | Arxiv | To work at scale a complete image indexing system comprises two components An inverted file index to restrict the actual search to only a subset that should contain most of the items relevant to the query; An approximate distance computation mechanism to rapidly scan these lists. While supervised deep learning has recently enabled improvements to the latter the former continues to be based on unsupervised clustering in the literature. In this work we propose a first system that learns both components within a unifying neural framework of structured binary encoding. |
2017 | SUBIC A Supervised Structured Binary Code For Image Search | Jain Himalaya, Zepeda Joaquin, Pérez Patrick, Gribonval Rémi | Arxiv | For large-scale visual search highly compressed yet meaningful representations of images are essential. Structured vector quantizers based on product quantization and its variants are usually employed to achieve such compression while minimizing the loss of accuracy. Yet unlike binary hashing schemes these unsupervised methods have not yet benefited from the supervision end-to-end learning and novel architectures ushered in by the deep learning revolution. We hence propose herein a novel method to make deep convolutional neural networks produce supervised compact structured binary codes for visual search. Our method makes use of a novel block-softmax non-linearity and of batch-based entropy losses that together induce structure in the learned encodings. We show that our method outperforms state-of-the-art compact representations based on deep hashing or structured quantization in single and cross-domain category retrieval instance retrieval and classification. We make our code and models publicly available online. |
2017 | Fast Spectral Ranking For Similarity Search | Iscen Ahmet, Avrithis Yannis, Tolias Giorgos, Furon Teddy, Chum Ondrej | Arxiv | Despite the success of deep learning on representing images for particular object retrieval recent studies show that the learned representations still lie on manifolds in a high dimensional space. This makes the Euclidean nearest neighbor search biased for this task. Exploring the manifolds online remains expensive even if a nearest neighbor graph has been computed offline. This work introduces an explicit embedding reducing manifold search to Euclidean search followed by dot product similarity search. This is equivalent to linear graph filtering of a sparse signal in the frequency domain. To speed up online search we compute an approximate Fourier basis of the graph offline. We improve the state of art on particular object retrieval datasets including the challenging Instre dataset containing small objects. At a scale of 10^5 images the offline cost is only a few hours while query time is comparable to standard similarity search. |
2017 | Hashing In The Zero Shot Framework With Domain Adaptation | Pachori Shubham, Deshpande Ameya, Raman Shanmuganathan | Arxiv | Techniques to learn hash codes which can store and retrieve large dimensional multimedia data efficiently have attracted broad research interests in the recent years. With rapid explosion of newly emerged concepts and online data existing supervised hashing algorithms suffer from the problem of scarcity of ground truth annotations due to the high cost of obtaining manual annotations. Therefore we propose an algorithm to learn a hash function from training images belonging to seen classes which can efficiently encode images of unseen classes to binary codes. Specifically we project the image features from visual space and semantic features from semantic space into a common Hamming subspace. Earlier works to generate hash codes have tried to relax the discrete constraints on hash codes and solve the continuous optimization problem. However it often leads to quantization errors. In this work we use the max-margin classifier to learn an efficient hash function. To address the concern of domain-shift which may arise due to the introduction of new classes we also introduce an unsupervised domain adaptation model in the proposed hashing framework. Results on the three datasets show the advantage of using domain adaptation in learning a high-quality hash function and superiority of our method for the task of image retrieval performance as compared to several state-of-the-art hashing methods. |
2017 | Online Hashing | Huang Long-kai, Yang Qiang, Zheng Wei-shi | Arxiv | Although hash function learning algorithms have achieved great success in recent years most existing hash models are off-line which are not suitable for processing sequential or online data. To address this problem this work proposes an online hash model to accommodate data coming in stream for online learning. Specifically a new loss function is proposed to measure the similarity loss between a pair of data samples in hamming space. Then a structured hash model is derived and optimized in a passive-aggressive way. Theoretical analysis on the upper bound of the cumulative loss for the proposed online hash model is provided. Furthermore we extend our online hashing from a single-model to a multi-model online hashing that trains multiple models so as to retain diverse online hashing models in order to avoid biased update. The competitive efficiency and effectiveness of the proposed online hash models are verified through extensive experiments on several large-scale datasets as compared to related hashing methods. |
2017 | Unsupervised Triplet Hashing For Fast Image Retrieval | Huang Shanshan, Xiong Yichao, Zhang Ya, Wang Jia | Arxiv | Hashing has played a pivotal role in large-scale image retrieval. With the development of Convolutional Neural Network (CNN) hashing learning has shown great promise. But existing methods are mostly tuned for classification which are not optimized for retrieval tasks especially for instance-level retrieval. In this study we propose a novel hashing method for large-scale image retrieval. Considering the difficulty in obtaining labeled datasets for image retrieval task in large scale we propose a novel CNN-based unsupervised hashing method namely Unsupervised Triplet Hashing (UTH). The unsupervised hashing network is designed under the following three principles 1) more discriminative representations for image retrieval; 2) minimum quantization loss between the original real-valued feature descriptors and the learned hash codes; 3) maximum information entropy for the learned hash codes. Extensive experiments on CIFAR-10 MNIST and In-shop datasets have shown that UTH outperforms several state-of-the-art unsupervised hashing methods in terms of retrieval accuracy. |
2017 | Supervised Hashing Based On Energy Minimization | Hu Zihao, Luo Xiyi, Lu Hongtao, Yu Yong | Arxiv | Recently supervised hashing methods have attracted much attention since they can optimize retrieval speed and storage cost while preserving semantic information. Because hashing codes learning is NP-hard many methods resort to some form of relaxation technique. But the performance of these methods can easily deteriorate due to the relaxation. Luckily many supervised hashing formulations can be viewed as energy functions hence solving hashing codes is equivalent to learning marginals in the corresponding conditional random field (CRF). By minimizing the KL divergence between a fully factorized distribution and the Gibbs distribution of this CRF a set of consistency equations can be obtained but updating them in parallel may not yield a local optimum since the variational lower bound is not guaranteed to increase. In this paper we use a linear approximation of the sigmoid function to convert these consistency equations to linear systems which have a closed-form solution. By applying this novel technique to two classical hashing formulations KSH and SPLH we obtain two new methods called EM (energy minimizing based)-KSH and EM-SPLH. Experimental results on three datasets show the superiority of our methods. |
2017 | Enhance Feature Discrimination For Unsupervised Hashing | Hoang Tuan, Do Thanh-toan, Tan Dang-khoa Le, Cheung Ngai-man | Arxiv | We introduce a novel approach to improve unsupervised hashing. Specifically we propose a very efficient embedding method Gaussian Mixture Model embedding (Gemb). The proposed method using Gaussian Mixture Model embeds feature vector into a low-dimensional vector and simultaneously enhances the discriminative property of features before passing them into hashing. Our experiment shows that the proposed method boosts the hashing performance of many state-of-the-art e.g. Binary Autoencoder (BA) 1 Iterative Quantization (ITQ) 2 in standard evaluation metrics for the three main benchmark datasets. |
2017 | Hashing As Tie-aware Learning To Rank | He Kun, Cakir Fatih, Bargal Sarah Adel, Sclaroff Stan | Arxiv | Hashing or learning binary embeddings of data is frequently used in nearest neighbor retrieval. In this paper we develop learning to rank formulations for hashing aimed at directly optimizing ranking-based evaluation metrics such as Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG). We first observe that the integer-valued Hamming distance often leads to tied rankings and propose to use tie-aware versions of AP and NDCG to evaluate hashing for retrieval. Then to optimize tie-aware ranking metrics we derive their continuous relaxations and perform gradient-based optimization with deep neural networks. Our results establish the new state-of-the-art for image retrieval by Hamming ranking in common benchmarks. |
2017 | Deep Discrete Hashing With Self-supervised Pairwise Labels | Song Jingkuan, He Tao, Fan Hangbo, Gao Lianli | Arxiv | Hashing methods have been widely used for applications of large-scale image retrieval and classification. Non-deep hashing methods using handcrafted features have been significantly outperformed by deep hashing methods due to their better feature representation and end-to-end learning framework. However the most striking successes in deep hashing have mostly involved discriminative models which require labels. In this paper we propose a novel unsupervised deep hashing method named Deep Discrete Hashing (DDH) for large-scale image retrieval and classification. In the proposed framework we address two main problems 1) how to directly learn discrete binary codes 2) how to equip the binary representation with the ability of accurate image retrieval and classification in an unsupervised way We resolve these problems by introducing an intermediate variable and a loss function steering the learning process which is based on the neighborhood structure in the original space. Experimental results on standard datasets (CIFAR-10 NUS-WIDE and Oxford-17) demonstrate that our DDH significantly outperforms existing hashing methods by large margin in terms of~mAP for image retrieval and object recognition. Code is available at (url)https://github.com/htconquer/ddh}.” |
2017 | Sketching Word Vectors Through Hashing | Qasemizadeh Behrang, Kallmeyer Laura | Arxiv | We propose a new fast word embedding technique using hash functions. The method is a derandomization of a new type of random projections By disregarding the classic constraint used in designing random projections (i.e. preserving pairwise distances in a particular normed space) our solution exploits extremely sparse non-negative random projections. Our experiments show that the proposed method can achieve competitive results comparable to neural embedding learning techniques however with only a fraction of the computational complexity of these methods. While the proposed derandomization enhances the computational and space complexity of our method the possibility of applying weighting methods such as positive pointwise mutual information (PPMI) to our models after their construction (and at a reduced dimensionality) imparts a high discriminatory power to the resulting embeddings. Obviously this method comes with other known benefits of random projection-based techniques such as ease of update. |
2017 | A Survey on Learning to Hash | Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, and Heng Tao Shen | TPAMI | Nearest neighbor search is a problem of finding the data points from the database such that the distances from them to the query point are the smallest. Learning to hash is one of the major solutions to this problem and has been widely studied recently. In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations. We separate quantization from pairwise similarity preserving as the objective function is very different though quantization, as we show, can be derived from preserving the pairwise similarities. In addition, we present the evaluation protocols, and the general performance analysis, and point out that the quantization algori |
2017 | Cross-Modal Deep Variational Hashing | Venice Erin Liong, Jiwen Lu, Yap-Peng Tan, and Jie Zhou | ICCV | In this paper, we propose a cross-modal deep variational hashing (CMDVH) method for cross-modality multimedia retrieval. Unlike existing cross-modal hashing methods which learn a single pair of projections to map each example as a binary vector, we design a couple of deep neural network to learn non-linear transformations from imagetext input pairs, so that unified binary codes can be obtained. We then design the modality-specific neural networks in a probabilistic manner where we model a latent variable as close as possible from the inferred binary codes, which is approximated by a posterior distribution regularized by a known prior. Experimental results on three benchmark datasets show the efficacy of the proposed approach. |
2017 | Deep Semantic Hashing with Generative Adversarial Networks | Zhaofan Qiu, Yingwei Pan, Ting Yao, Tao Mei | SIGIR | Hashing has been a widely-adopted technique for nearest neighbor search in large-scale image retrieval tasks. Recent research has shown that leveraging supervised information can lead to high quality hashing. However, the cost of annotating data is often an obstacle when applying supervised hashing to a new domain. Moreover, the results can suffer from the robustness problem as the data at training and test stage may come from different distributions. This paper studies the exploration of generating synthetic data through semisupervised generative adversarial networks (GANs), which leverages largely unlabeled and limited labeled training data to produce highly compelling data with intrinsic invariance and global coherence, for better understanding statistical structures of natural data. We demonstrate that the above two limitations can be well mitigated by applying the synthetic data for hashing. Specifically, a novel deep semantic hashing with GANs (DSH-GANs) is presented, which mainly consists of four components: a deep convolution neural networks (CNN) for learning image representations, an adversary stream to distinguish synthetic images from real ones, a hash stream for encoding image representations to hash codes and a classification stream. The whole architecture is trained endto-end by jointly optimizing three losses, i.e., adversarial loss to correct label of synthetic or real for each sample, triplet ranking loss to preserve the relative similarity ordering in the input real-synthetic triplets and classification loss to classify each sample accurately. Extensive experiments conducted on both CIFAR-10 and NUS-WIDE image benchmarks validate the capability of exploiting synthetic images for hashing. Our framework also achieves superior results when compared to state-of-the-art deep hash models. |
2017 | Foresthash Semantic Hashing With Shallow Random Forests And Tiny Convolutional Networks | Qiu Qiang, Lezama Jose, Bronstein Alex, Sapiro Guillermo | Arxiv | Hash codes are efficient data representations for coping with the ever growing amounts of data. In this paper we introduce a random forest semantic hashing scheme that embeds tiny convolutional neural networks (CNN) into shallow random forests with near-optimal information-theoretic code aggregation among trees. We start with a simple hashing scheme where random trees in a forest act as hashing functions by setting 1 for the visited tree leaf and 0 for the rest. We show that traditional random forests fail to generate hashes that preserve the underlying similarity between the trees rendering the random forests approach to hashing challenging. To address this we propose to first randomly group arriving classes at each tree split node into two groups obtaining a significantly simplified two-class classification problem which can be handled using a light-weight CNN weak learner. Such random class grouping scheme enables code uniqueness by enforcing each class to share its code with different classes in different trees. A non-conventional low-rank loss is further adopted for the CNN weak learners to encourage code consistency by minimizing intra-class variations and maximizing inter-class distance for the two random class groups. Finally we introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code producing a near-optimal unique hash for each class. The proposed approach significantly outperforms state-of-the-art hashing methods for image retrieval tasks on large-scale public datasets while performing at the level of other state-of-the-art image classification techniques while utilizing a more compact and efficient scalable representation. This work proposes a principled and robust procedure to train and deploy in parallel an ensemble of light-weight CNNs instead of simply going deeper. |
2017 | Superminhash - A New Minwise Hashing Algorithm For Jaccard Similarity Estimation | Ertl Otmar | Arxiv | This paper presents a new algorithm for calculating hash signatures of sets which can be directly used for Jaccard similarity estimation. The new approach is an improvement over the MinHash algorithm because it has a better runtime behavior and the resulting signatures allow a more precise estimation of the Jaccard index. |
2017 | An Efficient Deep Learning Hashing Neural Network For Mobile Visual Search | Qi Heng, Liu Wu, Liu Liang | Arxiv | Mobile visual search applications are emerging that enable users to sense their surroundings with smart phones. However because of the particular challenges of mobile visual search achieving a high recognition bitrate has becomes a consistent target of previous related works. In this paper we propose a few-parameter low-latency and high-accuracy deep hashing approach for constructing binary hash codes for mobile visual search. First we exploit the architecture of the MobileNet model which significantly decreases the latency of deep feature extraction by reducing the number of model parameters while maintaining accuracy. Second we add a hash-like layer into MobileNet to train the model on labeled mobile visual data. Evaluations show that the proposed system can exceed state-of-the-art accuracy performance in terms of the MAP. More importantly the memory consumption is much less than that of other deep learning models. The proposed method requires only 13 MB of memory for the neural network and achieves a MAP of 97.8037; on the mobile location recognition dataset used for testing. |
2017 | Hash Embeddings For Efficient Word Representations | Svenstrup Dan, Hansen Jonas Meinertz, Winther Ole | Arxiv | We present hash embeddings an efficient method for representing words in a continuous vector form. A hash embedding may be seen as an interpolation between a standard word embedding and a word embedding created using a random hash function (the hashing trick). In hash embeddings each token is represented by k d-dimensional embeddings vectors and one k dimensional weight vector. The final d dimensional representation of the token is the product of the two. Rather than fitting the embedding vectors for each token these are selected by the hashing trick from a shared pool of B embedding vectors. Our experiments show that hash embeddings can easily deal with huge vocabularies consisting of millions of tokens. When using a hash embedding there is no need to create a dictionary before training nor to perform any kind of vocabulary pruning after training. We show that models trained using hash embeddings exhibit at least the same level of performance as models trained using regular embeddings across a wide range of tasks. Furthermore the number of parameters needed by such an embedding is only a fraction of what is required by a regular embedding. Since standard embeddings and embeddings constructed using the hashing trick are actually just special cases of a hash embedding hash embeddings can be considered an extension and improvement over the existing regular embedding types. |
2017 | Deep Supervised Hashing for Multi-Label and Large-Scale Image Retrieval | Dayan Wu, Zheng Lin, Bo Li, Mingzhen Ye, Weiping Wang | ICMR | One of the most challenging tasks in large-scale multi-label image retrieval is to map images into binary codes while preserving multilevel semantic similarity. Recently, several deep supervised hashing methods have been proposed to learn hash functions that preserve multilevel semantic similarity with deep convolutional neural networks. However, these triplet label based methods try to preserve the ranking order of images according to their similarity degrees to the queries while not putting direct constraints on the distance between the codes of very similar images. Besides, the current evaluation criteria are not able to measure the performance of existing hashing methods on preserving fine-grained multilevel semantic similarity. To tackle these issues, we propose a novel Deep Multilevel Semantic Similarity Preserving Hashing (DMSSPH) method to learn compact similarity-preserving binary codes for the huge body of multi-label image data with deep convolutional neural networks. In our approach, we make the best of the supervised information in the form of pairwise labels to maximize the discriminability of output binary codes. Extensive evaluations conducted on several benchmark datasets demonstrate that the proposed method significantly outperforms the state-of-the-art supervised and unsupervised hashing methods at the accuracies of top returned images, especially for shorter binary codes. Meanwhile, the proposed method shows better performance on preserving fine-grained multilevel semantic similarity according to the results under the Jaccard coefficient based evaluation criteria we propose. |
2017 | Unsupervised Generative Adversarial Cross-modal Hashing | Zhang Jian, Peng Yuxin, Yuan Mingkuan | Arxiv | Cross-modal hashing aims to map heterogeneous multimedia data into a common Hamming space which can realize fast and flexible retrieval across different modalities. Unsupervised cross-modal hashing is more flexible and applicable than supervised methods since no intensive labeling work is involved. However existing unsupervised methods learn hashing functions by preserving inter and intra correlations while ignoring the underlying manifold structure across different modalities which is extremely helpful to capture meaningful nearest neighbors of different modalities for cross-modal retrieval. To address the above problem in this paper we propose an Unsupervised Generative Adversarial Cross-modal Hashing approach (UGACH) which makes full use of GANs ability for unsupervised representation learning to exploit the underlying manifold structure of cross-modal data. The main contributions can be summarized as follows (1) We propose a generative adversarial network to model cross-modal hashing in an unsupervised fashion. In the proposed UGACH given a data of one modality the generative model tries to fit the distribution over the manifold structure and select informative data of another modality to challenge the discriminative model. The discriminative model learns to distinguish the generated data and the true positive data sampled from correlation graph to achieve better retrieval accuracy. These two models are trained in an adversarial way to improve each other and promote hashing function learning. (2) We propose a correlation graph based approach to capture the underlying manifold structure across different modalities so that data of different modalities but within the same manifold can have smaller Hamming distance and promote retrieval accuracy. Extensive experiments compared with 6 state-of-the-art methods verify the effectiveness of our proposed approach. |
2017 | Deep Hashing Network For Unsupervised Domain Adaptation | Venkateswara Hemanth, Eusebio Jose, Chakraborty Shayok, Panchanathan Sethuraman | Arxiv | In recent years deep neural networks have emerged as a dominant machine learning tool for a wide variety of application domains. However training a deep neural network requires a large amount of labeled data which is an expensive process in terms of time labor and human expertise. Domain adaptation or transfer learning algorithms address this challenge by leveraging labeled data in a different but related source domain to develop a model for the target domain. Further the explosive growth of digital data has posed a fundamental challenge concerning its storage and retrieval. Due to its storage and retrieval efficiency recent years have witnessed a wide application of hashing in a variety of computer vision applications. In this paper we first introduce a new dataset Office-Home to evaluate domain adaptation algorithms. The dataset contains images of a variety of everyday objects from multiple domains. We then propose a novel deep learning framework that can exploit labeled source data and unlabeled target data to learn informative hash codes to accurately classify unseen target data. To the best of our knowledge this is the first research effort to exploit the feature learning capabilities of deep neural networks to learn representative hash codes to address the domain adaptation problem. Our extensive empirical studies on multiple transfer tasks corroborate the usefulness of the framework in learning efficient hash codes which outperform existing competitive baselines for unsupervised domain adaptation. |
2017 | Supervised Hashing With End-to-end Binary Deep Neural Network | Tan Dang-khoa Le, Do Thanh-toan, Cheung Ngai-man | Arxiv | Image hashing is a popular technique applied to large scale content-based visual retrieval due to its compact and efficient binary codes. Our work proposes a new end-to-end deep network architecture for supervised hashing which directly learns binary codes from input images and maintains good properties over binary codes such as similarity preservation independence and balancing. Furthermore we also propose a new learning scheme that can cope with the binary constrained loss function. The proposed algorithm not only is scalable for learning over large-scale datasets but also outperforms state-of-the-art supervised hashing methods which are illustrated throughout extensive experiments from various image retrieval benchmarks. |
2017 | Dynamic Space Efficient Hashing | Maier Tobias, Sanders Peter | Arxiv | We consider space efficient hash tables that can grow and shrink dynamically and are always highly space efficient i.e. their space consumption is always close to the lower bound even while growing and when taking into account storage that is only needed temporarily. None of the traditionally used hash tables have this property. We show how known approaches like linear probing and bucket cuckoo hashing can be adapted to this scenario by subdividing them into many subtables or using virtual memory overcommitting. However these rather straightforward solutions suffer from slow amortized insertion times due to frequent reallocation in small increments. Our main result is DySECT ((bf) Dynamic (bf) Space (bf) Efficient (bf) Cuckoo (bf) Table) which avoids these problems. DySECT consists of many subtables which grow by doubling their size. The resulting inhomogeneity in subtable sizes is equalized by the flexibility available in bucket cuckoo hashing where each element can go to several buckets each of which containing several cells. Experiments indicate that DySECT works well with load factors up to 9837;. With up to 2.7 times better performance than the next best solution. |
2017 | Compact Hash Code Learning With Binary Deep Neural Network | Do Thanh-toan, Hoang Tuan, Tan Dang-khoa Le, Doan Anh-dzung, Cheung Ngai-man | Arxiv | Learning compact binary codes for image retrieval problem using deep neural networks has recently attracted increasing attention. However training deep hashing networks is challenging due to the binary constraints on the hash codes. In this paper we propose deep network models and learning algorithms for learning binary hash codes given image representations under both unsupervised and supervised manners. The novelty of our network design is that we constrain one hidden layer to directly output the binary codes. This design has overcome a challenging problem in some previous works optimizing non-smooth objective functions because of binarization. In addition we propose to incorporate independence and balance properties in the direct and strict forms into the learning schemes. We also include a similarity preserving property in our objective functions. The resulting optimizations involving these binary independence and balance constraints are difficult to solve. To tackle this difficulty we propose to learn the networks with alternating optimization and careful relaxation. Furthermore by leveraging the powerful capacity of convolutional neural networks we propose an end-to-end architecture that jointly learns to extract visual features and produce binary hash codes. Experimental results for the benchmark datasets show that the proposed methods compare favorably or outperform the state of the art. |
2017 | Simultaneous Feature Aggregating And Hashing For Large-scale Image Search | Do Thanh-toan, Tan Dang-khoa Le, Pham Trung T., Cheung Ngai-man | Arxiv | In most state-of-the-art hashing-based visual search systems local image descriptors of an image are first aggregated as a single feature vector. This feature vector is then subjected to a hashing function that produces a binary hash code. In previous work the aggregating and the hashing processes are designed independently. In this paper we propose a novel framework where feature aggregating and hashing are designed simultaneously and optimized jointly. Specifically our joint optimization produces aggregated representations that can be better reconstructed by some binary codes. This leads to more discriminative binary hash codes and improved retrieval accuracy. In addition we also propose a fast version of the recently-proposed Binary Autoencoder to be used in our proposed framework. We perform extensive retrieval experiments on several benchmark datasets with both SIFT and convolutional features. Our results suggest that the proposed framework achieves significant improvements over the state of the art. |
2017 | Video Retrieval Based On Deep Convolutional Neural Network | Dong Yj, Li Jg | Arxiv | Recently with the enormous growth of online videos fast video retrieval research has received increasing attention. As an extension of image hashing techniques traditional video hashing methods mainly depend on hand-crafted features and transform the real-valued features into binary hash codes. As videos provide far more diverse and complex visual information than images extracting features from videos is much more challenging than that from images. Therefore high-level semantic features to represent videos are needed rather than low-level hand-crafted methods. In this paper a deep convolutional neural network is proposed to extract high-level semantic features and a binary hash function is then integrated into this framework to achieve an end-to-end optimization. Particularly our approach also combines triplet loss function which preserves the relative similarity and difference of videos and classification loss function as the optimization objective. Experiments have been performed on two public datasets and the results demonstrate the superiority of our proposed method compared with other state-of-the-art video retrieval methods. |
2017 | Lempel-ziv Jaccard Distance An Effective Alternative To Ssdeep And Sdhash | Raff Edward, Nicholas Charles K. | Arxiv | Recent work has proposed the Lempel-Ziv Jaccard Distance (LZJD) as a method to measure the similarity between binary byte sequences for malware classification. We propose and test LZJDs effectiveness as a similarity digest hash for digital forensics. To do so we develop a high performance Java implementation with the same command-line arguments as sdhash making it easy to integrate into existing workflows. Our testing shows that LZJD is effective for this task and significantly outperforms sdhash and ssdeep in its ability to match related file fragments and files corrupted with random noise. In addition LZJD is up to 60x faster than sdhash at comparison time. |
2017 | Hashganattention-aware Deep Adversarial Hashing For Cross Modal Retrieval | Zhang Xi, Zhou Siyu, Feng Jiashi, Lai Hanjiang, Li Bo, Pan Yan, Yin Jian, Yan Shuicheng | Arxiv | As the rapid growth of multi-modal data hashing methods for cross-modal retrieval have received considerable attention. Deep-networks-based cross-modal hashing methods are appealing as they can integrate feature learning and hash coding into end-to-end trainable frameworks. However it is still challenging to find content similarities between different modalities of data due to the heterogeneity gap. To further address this problem we propose an adversarial hashing network with attention mechanism to enhance the measurement of content similarities by selectively focusing on informative parts of multi-modal data. The proposed new adversarial network HashGAN consists of three building blocks 1) the feature learning module to obtain feature representations 2) the generative attention module to generate an attention mask which is used to obtain the attended (foreground) and the unattended (background) feature representations 3) the discriminative hash coding module to learn hash functions that preserve the similarities between different modalities. In our framework the generative module and the discriminative module are trained in an adversarial way the generator is learned to make the discriminator cannot preserve the similarities of multi-modal data w.r.t. the background feature representations while the discriminator aims to preserve the similarities of multi-modal data w.r.t. both the foreground and the background feature representations. Extensive evaluations on several benchmark datasets demonstrate that the proposed HashGAN brings substantial improvements over other state-of-the-art cross-modal hashing methods. |
2017 | Learning Robust Hash Codes For Multiple Instance Image Retrieval | Conjeti Sailesh, Paschali Magdalini, Katouzian Amin, Navab Nassir | Arxiv | In this paper for the first time we introduce a multiple instance (MI) deep hashing technique for learning discriminative hash codes with weak bag-level supervision suited for large-scale retrieval. We learn such hash codes by aggregating deeply learnt hierarchical representations across bag members through a dedicated MI pool layer. For better trainability and retrieval quality we propose a two-pronged approach that includes robust optimization and training with an auxiliary single instance hashing arm which is down-regulated gradually. We pose retrieval for tumor assessment as an MI problem because tumors often coexist with benign masses and could exhibit complementary signatures when scanned from different anatomical views. Experimental validations on benchmark mammography and histology datasets demonstrate improved retrieval performance over the state-of-the-art methods. |
2017 | Lightweight Fingerprints For Fast Approximate Keyword Matching Using Bitwise Operations | Cisłak Aleksander, Grabowski Szymon | Arxiv | We aim to speed up approximate keyword matching by storing a lightweight fixed-size block of data for each string called a fingerprint. These work in a similar way to hash values; however they can be also used for matching with errors. They store information regarding symbol occurrences using individual bits and they can be compared against each other with a constant number of bitwise operations. In this way certain strings can be deduced to be at least within the distance k from each other (using Hamming or Levenshtein distance) without performing an explicit verification. We show experimentally that for a preprocessed collection of strings fingerprints can provide substantial speedups for k = 1 namely over 2.5 times for the Hamming distance and over 10 times for the Levenshtein distance. Tests were conducted on synthetic and real-world English and URL data. |
2017 | Fast Locality-sensitive Hashing Frameworks For Approximate Near Neighbor Search | Christiani Tobias | Arxiv | The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a general technique for constructing a data structure to answer approximate near neighbor queries by using a distribution (mathcalH) over locality-sensitive hash functions that partition space. For a collection of n points after preprocessing the query time is dominated by O(n^(rho) (log) n) evaluations of hash functions from (mathcalH) and O(n^(rho)) hash table lookups and distance computations where (rho) (in) (01) is determined by the locality-sensitivity properties of (mathcalH). It follows from a recent result by Dahlgaard et al. (FOCS 2017) that the number of locality-sensitive hash functions can be reduced to O((log)^2 n) leaving the query time to be dominated by O(n^(rho)) distance computations and O(n^(rho) (log) n) additional word-RAM operations. We state this result as a general framework and provide a simpler analysis showing that the number of lookups and distance computations closely match the Indyk-Motwani framework making it a viable replacement in practice. Using ideas from another locality-sensitive hashing framework by Andoni and Indyk (SODA 2006) we are able to reduce the number of additional word-RAM operations to O(n^(rho)). |
2017 | Practical Hash Functions For Similarity Estimation And Dimensionality Reduction | Søren Dahlgaard, Mathias Knudsen, Mikkel Thorup | Neural Information Processing Systems | Hashing is a basic tool for dimensionality reduction employed in several aspects of machine learning. However the perfomance analysis is often carried out under the abstract assumption that a truly random unit cost hash function is used without concern for which concrete hash function is employed. The concrete hash function may work fine on sufficiently random input. The question is if it can be trusted in the real world when faced with more structured input. In this paper we focus on two prominent applications of hashing namely similarity estimation with the one permutation hashing (OPH) scheme of Li et al. NIPS12 and feature hashing (FH) of Weinberger et al. ICML09 both of which have found numerous applications i.e. in approximate near-neighbour search with LSH and large-scale classification with SVM. We consider the recent mixed tabulation hash function of Dahlgaard et al. FOCS15 which was proved theoretically to perform like a truly random hash function in many applications including the above OPH. Here we first show improved concentration bounds for FH with truly random hashing and then argue that mixed tabulation performs similar when the input vectors are sparse. Our main contribution however is an experimental comparison of different hashing schemes when used inside FH OPH and LSH. We find that mixed tabulation hashing is almost as fast as the classic multiply-mod-prime scheme ax+b mod p. Mutiply-mod-prime is guaranteed to work well on sufficiently random data but we demonstrate that in the above applications it can lead to bias and poor concentration on both real-world and synthetic data. We also compare with the very popular MurmurHash3 which has no proven guarantees. Mixed tabulation and MurmurHash3 both perform similar to truly random hashing in our experiments. However mixed tabulation was 4037; faster than MurmurHash3 and it has the proven guarantee of good performance on all possible input making it more reliable. |
2017 | Stochastic Generative Hashing | Dai Bo, Guo Ruiqi, Kumar Sanjiv, He Niao, Song Le | Arxiv | Learning-based binary hashing has become a powerful paradigm for fast search and retrieval in massive databases. However due to the requirement of discrete outputs for the hash functions learning such functions is known to be very challenging. In addition the objective functions adopted by existing hashing techniques are mostly chosen heuristically. In this paper we propose a novel generative approach to learn hash functions through Minimum Description Length principle such that the learned hash codes maximally compress the dataset and can also be used to regenerate the inputs. We also develop an efficient learning algorithm based on the stochastic distributional gradient which avoids the notorious difficulty caused by binary output constraints to jointly optimize the parameters of the hash function and the associated generative model. Extensive experiments on a variety of large-scale datasets show that the proposed method achieves better retrieval results than the existing state-of-the-art methods. |
2017 | Deep Hashing With Category Mask For Fast Video Retrieval | Liu Xu, Zhao Lili, Ding Dajun, Dong Yajiao | Arxiv | This paper proposes an end-to-end deep hashing framework with category mask for fast video retrieval. We train our network in a supervised way by fully exploiting inter-class diversity and intra-class identity. Classification loss is optimized to maximize inter-class diversity while intra-pair is introduced to learn representative intra-class identity. We investigate the binary bits distribution related to categories and find out that the effectiveness of binary bits is highly correlated with data categories and some bits may degrade classification performance of some categories. We then design hash code generation scheme with category mask to filter out bits with negative contribution. Experimental results demonstrate the proposed method outperforms several state-of-the-arts under various evaluation metrics on public datasets. |
2017 | Discrete Multi-modal Hashing With Canonical Views For Robust Mobile Landmark Search | Zhu Lei, Huang Zi, Liu Xiaobai, He Xiangnan, Song Jingkuan, Zhou Xiaofang | Arxiv | Mobile landmark search (MLS) recently receives increasing attention for its great practical values. However it still remains unsolved due to two important challenges. One is high bandwidth consumption of query transmission and the other is the huge visual variations of query images sent from mobile devices. In this paper we propose a novel hashing scheme named as canonical view based discrete multi-modal hashing (CV-DMH) to handle these problems via a novel three-stage learning procedure. First a submodular function is designed to measure visual representativeness and redundancy of a view set. With it canonical views which capture key visual appearances of landmark with limited redundancy are efficiently discovered with an iterative mining strategy. Second multi-modal sparse coding is applied to transform visual features from multiple modalities into an intermediate representation. It can robustly and adaptively characterize visual contents of varied landmark images with certain canonical views. Finally compact binary codes are learned on intermediate representation within a tailored discrete binary embedding model which preserves visual relations of images measured with canonical views and removes the involved noises. In this part we develop a new augmented Lagrangian multiplier (ALM) based optimization method to directly solve the discrete binary codes. We can not only explicitly deal with the discrete constraint but also consider the bit-uncorrelated constraint and balance constraint together. Experiments on real world landmark datasets demonstrate the superior performance of CV-DMH over several state-of-the-art methods. |
2017 | Improved Consistent Weighted Sampling Revisited | Wu Wei, Li Bin, Chen Ling, Zhang Chengqi, Yu Philip S. | Arxiv | Min-Hash is a popular technique for efficiently estimating the Jaccard similarity of binary sets. Consistent Weighted Sampling (CWS) generalizes the Min-Hash scheme to sketch weighted sets and has drawn increasing interest from the community. Due to its constant-time complexity independent of the values of the weights Improved CWS (ICWS) is considered as the state-of-the-art CWS algorithm. In this paper we revisit ICWS and analyze its underlying mechanism to show that there actually exists dependence between the two components of the hash-code produced by ICWS which violates the condition of independence. To remedy the problem we propose an Improved ICWS (I^2CWS) algorithm which not only shares the same theoretical computational complexity as ICWS but also abides by the required conditions of the CWS scheme. The experimental results on a number of synthetic data sets and real-world text data sets demonstrate that our I^2CWS algorithm can estimate the Jaccard similarity more accurately and also compete with or outperform the compared methods including ICWS in classification and top-K retrieval after relieving the underlying dependence. |
2017 | Hash Embeddings For Efficient Word Representations | Dan Tito Svenstrup, Jonas Hansen, Ole Winther | Neural Information Processing Systems | We present hash embeddings an efficient method for representing words in a continuous vector form. A hash embedding may be seen as an interpolation between a standard word embedding and a word embedding created using a random hash function (the hashing trick). In hash embeddings each token is represented by k d-dimensional embeddings vectors and one k dimensional weight vector. The final d dimensional representation of the token is the product of the two. Rather than fitting the embedding vectors for each token these are selected by the hashing trick from a shared pool of B embedding vectors. Our experiments show that hash embeddings can easily deal with huge vocabularies consisting of millions tokens. When using a hash embedding there is no need to create a dictionary before training nor to perform any kind of vocabulary pruning after training. We show that models trained using hash embeddings exhibit at least the same level of performance as models trained using regular embeddings across a wide range of tasks. Furthermore the number of parameters needed by such an embedding is only a fraction of what is required by a regular embedding. Since standard embeddings and embeddings constructed using the hashing trick are actually just special cases of a hash embedding hash embeddings can be considered an extension and improvement over the existing regular embedding types. |
2017 | Variational Deep Semantic Hashing for Text Documents | Suthee Chaidaroon, Yi Fang | SIGIR | As the amount of textual data has been rapidly increasing over the past decade, efficient similarity search methods have become a crucial component of large-scale information retrieval systems. A popular strategy is to represent original data samples by compact binary codes through hashing. A spectrum of machine learning methods have been utilized, but they often lack expressiveness and flexibility in modeling to learn effective representations. The recent advances of deep learning in a wide range of applications has demonstrated its capability to learn robust and powerful feature representations for complex data. Especially, deep generative models naturally combine the expressiveness of probabilistic generative models with the high capacity of deep neural networks, which is very suitable for text modeling. However, little work has leveraged the recent progress in deep learning for text hashing. In this paper, we propose a series of novel deep document generative models for text hashing. The first proposed model is unsupervised while the second one is supervised by utilizing document labels/tags for hashing. The third model further considers document-specific factors that affect the generation of words. The probabilistic generative formulation of the proposed models provides a principled framework for model extension, uncertainty estimation, simulation, and interpretability. Based on variational inference and reparameterization, the proposed models can be interpreted as encoder-decoder deep neural networks and thus they are capable of learning complex nonlinear distributed representations of the original documents. We conduct a comprehensive set of experiments on four public testbeds. The experimental results have demonstrated the effectiveness of the proposed supervised learning models for text hashing. |
2017 | Transfer Adversarial Hashing For Hamming Space Retrieval | Cao Zhangjie, Long Mingsheng, Huang Chao, Wang Jianmin | Arxiv | Hashing is widely applied to large-scale image retrieval due to the storage and retrieval efficiency. Existing work on deep hashing assumes that the database in the target domain is identically distributed with the training set in the source domain. This paper relaxes this assumption to a transfer retrieval setting which allows the database and the training set to come from different but relevant domains. However the transfer retrieval setting will introduce two technical difficulties first the hash model trained on the source domain cannot work well on the target domain due to the large distribution gap; second the domain gap makes it difficult to concentrate the database points to be within a small Hamming ball. As a consequence transfer retrieval performance within Hamming Radius 2 degrades significantly in existing hashing methods. This paper presents Transfer Adversarial Hashing (TAH) a new hybrid deep architecture that incorporates a pairwise t-distribution cross-entropy loss to learn concentrated hash codes and an adversarial network to align the data distributions between the source and target domains. TAH can generate compact transfer hash codes for efficient image retrieval on both source and target domains. Comprehensive experiments validate that TAH yields state of the art Hamming space retrieval performance on standard datasets. |
2017 | Hashnet Deep Learning To Hash By Continuation | Cao Zhangjie, Long Mingsheng, Wang Jianmin, Yu Philip S. | Arxiv | Learning to hash has been widely applied to approximate nearest neighbor search for large-scale multimedia retrieval due to its computation efficiency and retrieval quality. Deep learning to hash which improves retrieval quality by end-to-end representation learning and hash encoding has received increasing attention recently. Subject to the ill-posed gradient difficulty in the optimization with sign activations existing deep learning to hash methods need to first learn continuous representations and then generate binary hash codes in a separated binarization step which suffer from substantial loss of retrieval quality. This work presents HashNet a novel deep architecture for deep learning to hash by continuation method with convergence guarantees which learns exactly binary hash codes from imbalanced similarity data. The key idea is to attack the ill-posed gradient problem in optimizing deep networks with non-smooth binary activations by continuation method in which we begin from learning an easier network with smoothed activation function and let it evolve during the training until it eventually goes back to being the original difficult to optimize deep network with the sign activation function. Comprehensive empirical evidence shows that HashNet can ge |