☆ 4.6 Article

Multi-view convolutional vision transformer for 3D object recognition

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION (2023)

期刊

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION

卷 95, 期 -, 页码 -

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jvcir.2023.103906

关键词

Multi-view; 3D object recognition; Feature fusion; Convolutional neural networks

类别

Computer Science, Information Systems Computer Science, Software Engineering

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

With the rapid development of 3D vision technology and increasing application of 3D objects, there is a need for better recognition methods. Existing view-based methods lack sufficient information interaction between different views. Inspired by vision transformer (ViT), a hybrid network is proposed that combines convolutional neural networks (CNN) and transformer to improve 3D object recognition performance. Experiment results show that the proposed multi-view convolutional vision transformer (MVCVT) has competitive performance compared to state-of-the-art methods on benchmark datasets.

With the rapid development of three-dimensional (3D) vision technology and the increasing application of 3D objects, there is an urgent need for 3D object recognition in the fields of computer vision, virtual reality, and artificial intelligence robots. The view-based method projects 3D objects into two-dimensional (2D) images from different viewpoints and applies convolutional neural networks (CNN) to model the projected views. Although these methods have achieved excellent recognition performance, there is not sufficient information interaction between the features of different views in these methods. Inspired by the recent success achieved by vision transformer (ViT) in image recognition, we propose a hybrid network by taking advantage of CNN to extract multi-scale local information of each view, and of transformer to capture the relevance of multi -scale information between different views. To verify the effectiveness of our multi-view convolutional vision transformer (MVCVT), we conduct experiments on two public benchmarks, ModelNet40 and ModelNet10, and compare with those of some state-of-the-art methods. The final results show that MVCVT has competitive performance in 3D object recognition.

Multi-view convolutional vision transformer for 3D object recognition

期刊

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Multi-view convolutional vision transformer for 3D object recognition

期刊

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文