☆ 4.7 Article

How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?

INTERNATIONAL JOURNAL OF COMPUTER VISION (2023)

Journal

INTERNATIONAL JOURNAL OF COMPUTER VISION

Volume -, Issue -, Pages -

Publisher

SPRINGER

DOI: 10.1007/s11263-023-01895-7

Keywords

CLIP; OOD detection; Fine-tuning; Multi-modality; Vision-language models; Prompt learning; Few-shot learning; Adaptor

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper investigates the OOD detection performance of CLIP model fine-tuning. By framing OOD detection as multi-modal concept matching, a connection between fine-tuning methods and various OOD scores is established. The results suggest that choosing the appropriate OOD scores is crucial for CLIP model fine-tuning, with prompt learning demonstrating state-of-the-art OOD detection performance.

Recent large vision-language models such as CLIP have shown remarkable out-of-distribution (OOD) detection and generalization performance. However, their zero-shot in-distribution (ID) accuracy is often limited for downstream datasets. Recent CLIP-based fine-tuning methods such as prompt learning have demonstrated significant improvements in ID classification and OOD generalization where OOD labels are available. Nonetheless, it remains unclear whether the model is reliable to semantic shifts without OOD labels. In this paper, we aim to bridge the gap and present a comprehensive study to understand how fine-tuning impact OOD detection for few-shot downstream tasks. By framing OOD detection as multi-modal concept matching, we establish a connection between fine-tuning methods and various OOD scores. Our results suggest that a proper choice of OOD scores is essential for CLIP-based fine-tuning. In particular, the maximum concept matching (MCM) score provides a promising solution consistently. We also show that prompt learning demonstrates the state-of-the-art OOD detection performance over the zero-shot counterpart.

How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?

Journal

INTERNATIONAL JOURNAL OF COMPUTER VISION

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?

Journal

INTERNATIONAL JOURNAL OF COMPUTER VISION

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper