4.6 Article

EmbedFormer: Embedded Depth-Wise Convolution Layer for Token Mixing

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Chemistry, Analytical

Single-Shot Object Detection via Feature Enhancement and Channel Attention

Yi Li et al.

Summary: In this study, a feature-enhancement- and channel-attention-guided single-shot detector (FCSSD) was proposed to improve object detection performance. By utilizing four modules, contextual and semantic information were explored, multi-scale features were refined, and channel weights were balanced, resulting in excellent detection performance for multi-scale object detection.

SENSORS (2022)

Article Chemistry, Analytical

Transformer Based Binocular Disparity Prediction with Occlusion Predict and Novel Full Connection Layers

Yi Liu et al.

Summary: In this paper, a disparity prediction algorithm based on Transformer is proposed to address the limitations and defects of the depth estimation algorithm based on the convolutional neural network. The proposed algorithm demonstrates significant advantages in various evaluation metrics.

SENSORS (2022)

Proceedings Paper Computer Science, Artificial Intelligence

MetaFormer is Actually What You Need for Vision

Weihao Yu et al.

Summary: Recent research has shown that transformers can be replaced with spatial MLPs in computer vision tasks and still perform well. The proposed PoolFormer model achieved competitive performance using a simple spatial pooling operator and emphasized the importance of MetaFormer in achieving superior results.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

A ConvNet for the 2020s

Zhuang Liu et al.

Summary: The development of visual recognition has gone through stages from ConvNets to ViTs and then to hybrid approaches. In this work, the design of a pure ConvNet is reexamined and several key components are discovered, resulting in the construction of the ConvNeXt model series. These models compete with Transformers in terms of accuracy and performance while maintaining the simplicity and efficiency of ConvNets.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

Xiaoyi Dong et al.

Summary: CSWin Transformer is an efficient and effective Transformer-based backbone for general-purpose vision tasks. It achieves competitive performance by using the Cross-Shaped Window self-attention mechanism, Locally-enhanced Positional Encoding, and a hierarchical structure.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

Article Computer Science, Artificial Intelligence

Semantic Understanding of Scenes Through the ADE20K Dataset

Bolei Zhou et al.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2019)

Article Computer Science, Hardware & Architecture

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky et al.

COMMUNICATIONS OF THE ACM (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Mask R-CNN

Kaiming He et al.

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Deformable Convolutional Networks

Jifeng Dai et al.

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2017)