相关参考文献
注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article
Computer Science, Artificial Intelligence
Li Yuan et al.
Summary: Vision Transformers (ViTs) have lower efficiency and limited feature richness compared to CNNs due to the simple tokenization of images and redundant attention backbone design. To overcome these limitations, a new architecture called VOLO is proposed, which uses outlook attention to dynamically aggregate local features. VOLO can efficiently encode fine-level features and achieve high-performance visual recognition.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Pichao Wang et al.
Summary: Convolutional Neural Networks (CNNs) have been dominant in computer vision for a long time, but recent vision transformer architectures have shown promising performance. This paper proposes a new approach called kappa-NN attention to enhance vision transformers by selecting the most similar tokens for attention map calculation.
COMPUTER VISION, ECCV 2022, PT XXIV
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Jiaqi Gu et al.
Summary: In this paper, we propose HRViT, a method to enhance the performance of ViTs on semantic segmentation tasks. By integrating high-resolution multi-branch architectures with ViTs and using various optimization techniques, we improve the performance and efficiency of the model. Experimental results demonstrate that HRViT outperforms existing MiT and CSWin backbones on ADE20K and Cityscapes.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Jianyuan Guo et al.
Summary: This paper introduces a novel hybrid network based on transformers and CNNs, called CMTs, which performs well in image recognition tasks and achieves a better trade-off between accuracy and computational efficiency.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Ze Liu et al.
Summary: This paper presents techniques for scaling Swin Transformer up to 3 billion parameters and the ability to train with high-resolution images. By increasing the capacity and resolution, Swin Transformer achieves new records on four representative vision benchmarks. Several novel technologies are proposed to address training instability and effectively transfer models from low-resolution to high-resolution. Using these techniques and self-supervised pre-training, a strong 3 billion Swin Transformer model is successfully trained, achieving state-of-the-art accuracy on various benchmarks.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Zhuang Liu et al.
Summary: The development of visual recognition has gone through stages from ConvNets to ViTs and then to hybrid approaches. In this work, the design of a pure ConvNet is reexamined and several key components are discovered, resulting in the construction of the ConvNeXt model series. These models compete with Transformers in terms of accuracy and performance while maintaining the simplicity and efficiency of ConvNets.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Yinpeng Chen et al.
Summary: Mobile-Former is a parallel design of MobileNet and transformer with a two-way bridge, combining the advantages of both models for efficient computation and enhanced representation power across image classification and object detection tasks.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Xiaoyi Dong et al.
Summary: CSWin Transformer is an efficient and effective Transformer-based backbone for general-purpose vision tasks. It achieves competitive performance by using the Cross-Shaped Window self-attention mechanism, Locally-enhanced Positional Encoding, and a hierarchical structure.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
(2022)
Article
Computer Science, Artificial Intelligence
Jiaqi Wang et al.
Summary: CARAFE++ is a universal, lightweight, and highly effective operator for feature reassembly in convolutional networks. It aggregates contextual information within a large receptive field, generates adaptive kernels for instance-specific content-aware handling, and introduces little computational overhead. It consistently shows significant improvements in various tasks, making it a strong building block for modern deep networks.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Jingkai Zhou et al.
Summary: The study introduces the Decoupled Dynamic Filter (DDF) to address the two main shortcomings of standard convolution, achieving performance improvement by decomposing the dynamic filter, limiting parameter numbers and computational costs, and replacing standard convolution with DDF in classification networks.
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021
(2021)
Proceedings Paper
Computer Science, Artificial Intelligence
Jin Chen et al.
Summary: DRConv, through Dynamic Region-Aware Convolution, effectively handles spatial information by improving the representation ability of convolution while maintaining computational cost and translation-invariance.
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021
(2021)
Proceedings Paper
Computer Science, Artificial Intelligence
Duo Li et al.
Summary: The study introduces a new operation named involution to replace standard convolution for vision tasks, showing improved performance of models while reducing computational costs.
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021
(2021)
Proceedings Paper
Computer Science, Artificial Intelligence
Jiaqi Wang et al.
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019)
(2019)
Proceedings Paper
Computer Science, Artificial Intelligence
Kaiming He et al.
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)
(2017)
Proceedings Paper
Computer Science, Artificial Intelligence
Francois Chollet
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017)
(2017)