Unimodal and Cross-Modal Hashing Datasets

Unimodal Datasets: For unimodal experiments (query and database are in the same feature space e.g. images), there are six popular and freely available image datasets: LabelMe, CIFAR-10, NUS-WIDE, MNIST, SIFT1M and ImageNet. The datasets are of widely varying size (22,019-1.3 million images), are represented by an array of different feature descriptors (from GIST, SIFT, RGB pixels to bag of visual words) and cover a diverse range of different image topics from natural scenes to personal photos, logos and drawings.

Cross-modal Datasets: Cross-modal retrieval experiments (query and database can be in different feature spaces e.g. image and text) are typically conducted on the `Wiki' dataset, Microsoft COCO and NUSWIDE datasets. All datasets come with images and associated paired textual descriptors, a key requirement for training and evaluating a cross-modal retrieval model.
NameDatasetModalitySizeFeatures
A. Krizhevsky, 2009.Learning Multiple Layers of Features from Tiny Images CIFAR10 Image 60000 512 dimensional GIST
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollar, 2014.Microsoft COCO: Common Objects in Context MS-COCO Image/Text 87783 RGB pixels (image) - 5 sentences per image (text)
J. Deng, W. Dong, R. Socher, L. Li, K. Li, L. Fei-Fei, 2009.ImageNet: A large-scale hierarchical image database ImageNet Image 1331167 4096 dimensional CNN
B. Russell, A. Torralba, K. Murphy, W. T. Freeman, 2007.LabelMe: a database and web-based tool for image annotation LabelMe Image 22019 512 dimensional GIST
M. J. Huiskes, M. S. Lew, 2008.The MIR Flickr Retrieval Evaluation. MIR-FLICKR25K Image/Text 25000 RGB pixels (image) - 38 categories 1386 tags (text)
Y. LeCun, C. Cortes, C. Burges, 1999.The MNIST Database of Handwritten Digits MNIST Image 70000 Grayscale Pixels
T. Chua, J. Tang, R. Hong, H. Li, Z. Luo, Y. Zheng, 2009.NUS-WIDE: a real-world web image database from National University of Singapore NUSWIDE Image/Text 269648 500 dimensional BoW (image) - 5018 dimensional tags (text)
H. Jegou, M. Douze, C. Schmid, 2009.Searching with quantization: approximate nearest neighbor search using short codes and distance estimators SIFT1M Image 1000000 SIFT
A. Torralba, R. Fergus and W. Freeman, 2008.80 million tiny images: a large dataset for non-parametric object and scene recognition TINY100K Image 100000 384 dimensional GIST
N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. Lanckriet, R.Levy and N. Vasconcelos, 2010.A New Approach to Cross-Modal Multimedia Retrieval WIKI Image/Text 2669 128 dimensional SIFT (image) - 10 dimensional LDA topics (text)