☆ 4.5 Article

PVT v2: Improved baselines with Pyramid Vision Transformer

COMPUTATIONAL VISUAL MEDIA (2022)

Journal

COMPUTATIONAL VISUAL MEDIA

Volume 8, Issue 3, Pages 415-424

Publisher

SPRINGERNATURE

DOI: 10.1007/s41095-022-0274-8

Keywords

transformers; dense prediction; image classification; object detection; semantic segmentation

Funding

National Natural Science Foundation of China [61672273, 61832008]
Science Foundation for Distinguished Young Scholars of Jiangsu [BK20160021]
Postdoctoral Innovative Talent Support Program of China [BX20200168, 2020M681608]
General Research Fund of Hong Kong [27208720]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This work presents the improved Pyramid Vision Transformer v2 (PVT v2) by adding three designs, achieving significant improvements in fundamental vision tasks. PVT v2 performs comparably or better than recent work such as the Swin transformer.

Transformers have recently lead to encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs: (i) a linear complexity attention layer, (ii) an overlapping patch embedding, and (iii) a convolutional feed-forward network. With these modifications, PVT v2 reduces the computational complexity of PVT v1 to linearity and provides significant improvements on fundamental vision tasks such as classification, detection, and segmentation. In particular, PVT v2 achieves comparable or better performance than recent work such as the Swin transformer. We hope this work will facilitate state-of-the-art transformer research in computer vision. Code is available at https://github.com/whai362/PVT.

PVT v2: Improved baselines with Pyramid Vision Transformer

Journal

COMPUTATIONAL VISUAL MEDIA

Publisher

SPRINGERNATURE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

PVT v2: Improved baselines with Pyramid Vision Transformer

Journal

COMPUTATIONAL VISUAL MEDIA

Publisher

SPRINGERNATURE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper