☆ 4.7 Article

MDvT: introducing mobile three-dimensional convolution to a vision transformer for hyperspectral image classification

INTERNATIONAL JOURNAL OF DIGITAL EARTH (2023)

期刊

INTERNATIONAL JOURNAL OF DIGITAL EARTH

卷 16, 期 1, 页码 1469-1490

出版社

TAYLOR & FRANCIS LTD

DOI: 10.1080/17538947.2023.2202423

关键词

Hyperspectral image; Classification; Convolutional neural network; Transformer

类别

Geography, Physical Remote Sensing

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

A new architecture called mobile 3D convolutional vision transformer (MDvT) is proposed to integrate 3D convolution with transformer models, achieving significant improvements in classification accuracy and model runtime. The MDvT reduces model parameters and accelerates model operation through inverted residual structure and square patch.

Hyperspectral images carry numerous spectral bands, and their wealth of band data is a valuable source of information for the accurate classification of ground objects. Three-dimensional (3D) convolution, although an excellent spectral information extraction method, is limited by its huge number of parameters and long model training time. To allow better integration of 3D convolution with the most popular transformer models currently available, a new architecture called mobile 3D convolutional vision transformer (MDvT) is proposed. The MDvT introduces inverted residual structure to reduce the number of model parameters and balance the data mining efficiency of low-dimensional data input. Simultaneously, a square patch is used to cut the sequence of tokens to accelerate the model operation. Through extensive experiments, we evaluated the classification overall performance of the proposed MDvT on the WHU-Hi and Pavia University datasets, and demonstrated significant improvements in classification accuracy and model runtime compared with classical deep learning models. It is worth noting that compared with directly integrating 3D convolution into the transformer model, the MDvT architecture improves the accuracy while reducing the time to train an epoch by approximately 58.54%. To facilitate the reproduction of the work in this paper, the model code is available at .

MDvT: introducing mobile three-dimensional convolution to a vision transformer for hyperspectral image classification

期刊

INTERNATIONAL JOURNAL OF DIGITAL EARTH

出版社

TAYLOR & FRANCIS LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

MDvT: introducing mobile three-dimensional convolution to a vision transformer for hyperspectral image classification

期刊

INTERNATIONAL JOURNAL OF DIGITAL EARTH

出版社

TAYLOR & FRANCIS LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文