How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?

Article Computer Science, Artificial Intelligence

Learning to Prompt for Vision-Language Models

Kaiyang Zhou et al.

Summary: Large pre-trained vision-language models like CLIP have shown potential in transferable representations, but prompt engineering is a major challenge. This study introduces Context Optimization (CoOp), a method that optimizes prompts by learning context word vectors, requiring only one or two shots to surpass hand-crafted prompts and achieving significant improvements over prompt engineering with more shots.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification

Renrui Zhang et al.

Summary: Contrastive Vision-Language Pre-training (CLIP) learns visual representations using large-scale image-text pairs and achieves impressive performance on downstream tasks. To enhance CLIP's adaptability, we propose a training-free adaptation method called Tip-Adapter, which constructs an adapter and updates prior knowledge using feature retrieval.

COMPUTER VISION - ECCV 2022, PT XXXV (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Conditional Prompt Learning for Vision-Language Models

Kaiyang Zhou et al.

Summary: This study investigates methods to adapt pre-trained vision-language models to downstream datasets. The Context Optimization (CoOp) method introduces the concept of prompt learning to adapt pre-trained models, but suffers from overfitting. To address this, the Conditional Context Optimization (CoCoOp) method is proposed, which generates dynamic prompts using a lightweight neural network and achieves better generalization performance.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

ViM: Out-Of-Distribution with Virtual-logit Matching

Haoqi Wang et al.

Summary: Most existing OOD detection algorithms rely on a single input source, which makes them fragile due to the diversity of OOD examples. This paper proposes a novel OOD scoring method called "Virtual-logit Matching" (ViM) that combines class-agnostic scores from the feature space and class-dependent logits from the in-distribution space. Additionally, a new OOD dataset for ImageNet-1K is created to facilitate the evaluation of large-scale OOD detection in academia.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence