Unimodal experiments, where both the query and database exist in the same feature space (e.g., images), are commonly conducted on six popular, freely available image datasets: LabelMe, CIFAR-10, NUS-WIDE, MNIST, SIFT1M, and ImageNet. These datasets vary greatly in size, ranging from 22,019 to 1.3 million images, and are represented by diverse feature descriptors like GIST, SIFT, RGB pixels, and bag-of-visual-words. The content spans a wide array of image topics, from natural scenes to personal photos, logos, and drawings, offering rich resources for unimodal hashing research.
Cross-modal retrieval experiments, where the query and database are in different feature spaces (e.g., image and text), are typically performed on the Wiki, Microsoft COCO, and NUSWIDE datasets. Each of these datasets includes images paired with textual descriptions, which are essential for training and evaluating cross-modal retrieval models.
Below is a list of key publications and their associated datasets:
Paper | Modality |
---|---|
Herve Jegou, Laurent Amsaleg, 2009. Datasets for approximate nearest neighbor search | Image |
A. Krizhevsky, 2009. Learning Multiple Layers of Features from Tiny Images | Image |
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollar, 2014. Microsoft COCO: Common Objects in Context | Image/Text |
Facebook/Meta, 2021. Facebook SimSearchNet++ | Image |
J. Deng, W. Dong, R. Socher, L. Li, K. Li, L. Fei-Fei, 2009. ImageNet: A large-scale hierarchical image database | Image |
B. Russell, A. Torralba, K. Murphy, W. T. Freeman, 2007. LabelMe: a database and web-based tool for image annotation | Image |
Microsoft, 2021. Microsoft SPACEV-1B | Image |
Herve Jegou, 2021. Microsoft Turing-ANNS-1B | Image |
M. J. Huiskes, M. S. Lew, 2008. The MIR Flickr Retrieval Evaluation. | Image/Text |
Y. LeCun, C. Cortes, C. Burges, 1999. The MNIST Database of Handwritten Digits | Image |
T. Chua, J. Tang, R. Hong, H. Li, Z. Luo, Y. Zheng, 2009. NUS-WIDE: a real-world web image database from National University of Singapore | Image/Text |
H. Jegou, M. Douze, C. Schmid, 2009. Searching with quantization: approximate nearest neighbor search using short codes and distance estimators | Image |
, . | Image |
N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. Lanckriet, R.Levy and N. Vasconcelos, 2010. A New Approach to Cross-Modal Multimedia Retrieval | Image/Text |
Yandex, 2021. Yandex DEEP-1B | Image |
Yandex, 2021. Yandex Text-to-Image-1B | Image/Text |