Unimodal and Cross-Modal Hashing Datasets
Unimodal Datasets:
Unimodal experiments (query and database are in the same feature space e.g. images) can be conducted on six popular and freely available image datasets: LabelMe, CIFAR-10, NUS-WIDE, MNIST, SIFT1M and ImageNet. The datasets are of
widely varying size (22,019-1.3 million images), are represented by an array of different
feature descriptors (from GIST, SIFT, RGB pixels to bag of visual words) and cover a diverse
range of different image topics from natural scenes to personal photos, logos and drawings.
Cross-modal Datasets:
Cross-modal retrieval experiments (query and database can be in different feature spaces e.g. image and text) are typically conducted on the `Wiki' dataset, Microsoft COCO and NUSWIDE datasets. All datasets come with images and associated
paired textual descriptors, a key requirement for training and evaluating a cross-modal
retrieval model.
Name | Dataset | Modality | Size | Features |
Herve Jegou, Laurent Amsaleg, 2009.Datasets for approximate nearest neighbor search |
BIGANN |
Image |
1 Billion |
128 dimensional SIFT |
A. Krizhevsky, 2009.Learning Multiple Layers of Features from Tiny Images |
CIFAR10 |
Image |
60000 |
512 dimensional GIST |
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollar, 2014.Microsoft COCO: Common Objects in Context |
MS-COCO |
Image/Text |
87783 |
RGB pixels (image) - 5 sentences per image (text) |
Facebook/Meta, 2021.Facebook SimSearchNet++ |
Facebook SimSearchNet++ |
Image |
1 Billion |
256 dimensional CNN |
J. Deng, W. Dong, R. Socher, L. Li, K. Li, L. Fei-Fei, 2009.ImageNet: A large-scale hierarchical image database |
ImageNet |
Image |
1331167 |
4096 dimensional CNN |
B. Russell, A. Torralba, K. Murphy, W. T. Freeman, 2007.LabelMe: a database and web-based tool for image annotation |
LabelMe |
Image |
22019 |
512 dimensional GIST |
Microsoft, 2021.Microsoft SPACEV-1B |
Microsoft SPACEV |
Image |
1 Billion |
100 dimensional deep learning |
Herve Jegou, 2021.Microsoft Turing-ANNS-1B |
Microsoft Turing-ANNS |
Image |
1 Billion |
100 dimensional Transformer |
M. J. Huiskes, M. S. Lew, 2008.The MIR Flickr Retrieval Evaluation. |
MIR-FLICKR25K |
Image/Text |
25000 |
RGB pixels (image) - 38 categories 1386 tags (text) |
Y. LeCun, C. Cortes, C. Burges, 1999.The MNIST Database of Handwritten Digits |
MNIST |
Image |
70000 |
Grayscale Pixels |
T. Chua, J. Tang, R. Hong, H. Li, Z. Luo, Y. Zheng, 2009.NUS-WIDE: a real-world web image database from National University of Singapore |
NUSWIDE |
Image/Text |
269648 |
500 dimensional BoW (image) - 5018 dimensional tags (text) |
H. Jegou, M. Douze, C. Schmid, 2009.Searching with quantization: approximate nearest neighbor search using short codes and distance estimators |
SIFT1M |
Image |
1000000 |
SIFT |
A. Torralba, R. Fergus and W. Freeman, 2008.80 million tiny images: a large dataset for non-parametric object and scene recognition |
TINY100K |
Image |
100000 |
384 dimensional GIST |
N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. Lanckriet, R.Levy and N. Vasconcelos, 2010.A New Approach to Cross-Modal Multimedia Retrieval |
WIKI |
Image/Text |
2669 |
128 dimensional SIFT (image) - 10 dimensional LDA topics (text) |
Yandex, 2021.Yandex DEEP-1B |
Yandex DEEP-1B |
Image |
1 Billion |
96 dimensional GoogLeNet |
Yandex, 2021.Yandex Text-to-Image-1B |
Yandex Text-to-Image-1B |
Image/Text |
1 Billion |
200 dimensional Se-ResNext-101 |