4.7 Article

F-SCP: An automatic prompt generation method for specific classes based on visual language pre-training models

Journal

PATTERN RECOGNITION
Volume 147, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2023.110096

Keywords

Multi-modal; Vision language model; Prompt tuning; Large-scale pre-training model

Ask authors/readers for more resources

This paper proposes a novel automatic prompt generation method called F-SCP, which focuses on generating accurate prompts for low-accuracy classes and similar classes. Experimental results show that our approach outperforms state-of-the-art methods on six multi-domain datasets.
The zero-shot classification performance of large-scale vision-language pre-training models (e.g., CLIP, BLIP and ALIGN) can be enhanced by incorporating a prompt (e.g., a photo of a [CLASS]) before the class words. Modifying the prompt slightly can have significant effect on the classification outcomes of these models. Thus, it is crucial to include an appropriate prompt tailored to the classes. However, manual prompt design is labor-intensive and necessitates domain-specific expertise. The CoOp (Context Optimization) converts hand-crafted prompt templates into learnable word vectors to automatically generate prompts, resulting in substantial improvements for CLIP. However, CoOp exhibited significant variation in classification performance across different classes. Although CoOp-CSC (Class-Specific Context) has a separate prompt for each class, only shows some advantages on fine-grained datasets. In this paper, we propose a novel automatic prompt generation method called F-SCP (Filter-based Specific Class Prompt), which distinguishes itself from the CoOp-UC (Unified Context) model and the CoOp-CSC model. Our approach focuses on prompt generation for low-accuracy classes and similar classes. We add the Filter and SCP modules to the prompt generation architecture. The Filter module selects the poorly classified classes, and then reproduce the prompts through the SCP (Specific Class Prompt) module to replace the prompts of specific classes. Experimental results on six multi-domain datasets shows the superiority of our approach over the state-of-the-art methods. Particularly, the improvement in accuracy for the specific classes mentioned above is significant. For instance, compared with CoOp-UC on the OxfordPets dataset, the low-accuracy classes, such as, Class21 and Class26, are improved by 18% and 12%, respectively.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available