☆ 3.8 Proceedings Paper

The Current State of the Art in Deep Learning for Image Classification: A Review

INTELLIGENT COMPUTING, VOL 2 (2022)

VOLO: Vision Outlooker for Visual Recognition

Li Yuan et al.

Summary: Vision Transformers (ViTs) have lower efficiency and limited feature richness compared to CNNs due to the simple tokenization of images and redundant attention backbone design. To overcome these limitations, a new architecture called VOLO is proposed, which uses outlook attention to dynamically aggregate local features. VOLO can efficiently encode fine-level features and achieve high-performance visual recognition.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Scaling Vision Transformers

Xiaohua Zhai et al.

Summary: Attention-based neural networks, including the Vision Transformer (ViT), have achieved state-of-the-art results in computer vision benchmarks. This study investigates the scaling properties of Vision Transformers and improves their architecture and training methods to enhance model accuracy. The experimental results demonstrate that a scaled ViT model with two billion parameters achieves a new state-of-the-art top-1 accuracy of 90.45% on ImageNet and performs well in few-shot transfer learning.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

Xiaoyi Dong et al.

Summary: CSWin Transformer is an efficient and effective Transformer-based backbone for general-purpose vision tasks. It achieves competitive performance by using the Cross-Shaped Window self-attention mechanism, Locally-enhanced Positional Encoding, and a hierarchical structure.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Network of Experts for Large-Scale Image Categorization

Karim Ahmed et al.

COMPUTER VISION - ECCV 2016, PT VII (2016)

添加到收藏夹

The Current State of the Art in Deep Learning for Image Classification: A Review

相关参考文献

VOLO: Vision Outlooker for Visual Recognition

Scaling Vision Transformers

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

Network of Experts for Large-Scale Image Categorization

导出引文

分享论文