Retrieval-enhanced Visual Prompt Learning For Few-shot Classification | Awesome Learning to Hash Add your paper to Learning2Hash

Retrieval-enhanced Visual Prompt Learning For Few-shot Classification

Jintao Rong, Hao Chen, Linlin Ou, Tianxiao Chen, Xinyi Yu, Yifan Liu . Arxiv 2023 – 4 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Datasets Evaluation Few Shot & Zero Shot

The Contrastive Language-Image Pretraining (CLIP) model has been widely used in various downstream vision tasks. The few-shot learning paradigm has been widely adopted to augment its capacity for these tasks. However, current paradigms may struggle with fine-grained classification, such as satellite image recognition, due to widening domain gaps. To address this limitation, we propose retrieval-enhanced visual prompt learning (RePrompt), which introduces retrieval mechanisms to cache and reuse the knowledge of downstream tasks. RePrompt constructs a retrieval database from either training examples or external data if available, and uses a retrieval mechanism to enhance multiple stages of a simple prompt learning baseline, thus narrowing the domain gap. During inference, our enhanced model can reference similar samples brought by retrieval to make more accurate predictions. A detailed analysis reveals that retrieval helps to improve the distribution of late features, thus, improving generalization for downstream tasks. Reprompt attains state-of-the-art performance on a wide range of vision datasets, including 11 image datasets, 3 video datasets, 1 multi-view dataset, and 4 domain generalization benchmarks.

Similar Work