EmbedFormer: Embedded Depth-Wise Convolution Layer for Token Mixing

Article Chemistry, Analytical

Single-Shot Object Detection via Feature Enhancement and Channel Attention

Yi Li et al.

Summary: In this study, a feature-enhancement- and channel-attention-guided single-shot detector (FCSSD) was proposed to improve object detection performance. By utilizing four modules, contextual and semantic information were explored, multi-scale features were refined, and channel weights were balanced, resulting in excellent detection performance for multi-scale object detection.

SENSORS (2022)

添加到收藏夹

Article Chemistry, Analytical

Transformer Based Binocular Disparity Prediction with Occlusion Predict and Novel Full Connection Layers

Yi Liu et al.

Summary: In this paper, a disparity prediction algorithm based on Transformer is proposed to address the limitations and defects of the depth estimation algorithm based on the convolutional neural network. The proposed algorithm demonstrates significant advantages in various evaluation metrics.

SENSORS (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

MetaFormer is Actually What You Need for Vision

Weihao Yu et al.

Summary: Recent research has shown that transformers can be replaced with spatial MLPs in computer vision tasks and still perform well. The proposed PoolFormer model achieved competitive performance using a simple spatial pooling operator and emphasized the importance of MetaFormer in achieving superior results.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

A ConvNet for the 2020s

Zhuang Liu et al.

Summary: The development of visual recognition has gone through stages from ConvNets to ViTs and then to hybrid approaches. In this work, the design of a pure ConvNet is reexamined and several key components are discovered, resulting in the construction of the ConvNeXt model series. These models compete with Transformers in terms of accuracy and performance while maintaining the simplicity and efficiency of ConvNets.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

Xiaoyi Dong et al.

Summary: CSWin Transformer is an efficient and effective Transformer-based backbone for general-purpose vision tasks. It achieves competitive performance by using the Cross-Shaped Window self-attention mechanism, Locally-enhanced Positional Encoding, and a hierarchical structure.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence