☆ 4.7 Article

Fractional Fourier Image Transformer for Multimodal Remote Sensing Data Classification

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2022)

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

卷 -, 期 -, 页码 -

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2022.3189994

关键词

Feature extraction; Transformers; Laser radar; Data mining; Discrete Fourier transforms; Visualization; Semantics; Fractional Fourier image transformer (FrIT); hyperspectral image (HSI); light detection and ranging (LiDAR); multimodal data

类别

Computer Science, Artificial Intelligence Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic

资金

National Natural Science Foundation of China [62001023, 61922013]
Beijing Natural Science Foundation [JQ20021]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this study, a novel deep learning method called fractional Fourier image transformer (FrIT) is proposed, which can effectively extract both global and local contexts, addressing the limitations of traditional deep learning methods.

With the recent development of the joint classification of hyperspectral image (HSI) and light detection and ranging (LiDAR) data, deep learning methods have achieved promising performance owing to their locally sematic feature extracting ability. Nonetheless, the limited receptive field restricted the convolutional neural networks (CNNs) to represent global contextual and sequential attributes, while visual image transformers (VITs) lose local semantic information. Focusing on these issues, we propose a fractional Fourier image transformer (FrIT) as a backbone network to extract both global and local contexts effectively. In the proposed FrIT framework, HSI and LiDAR data are first fused at the pixel level, and both multisource feature and HSI feature extractors are utilized to capture local contexts. Then, a plug-and-play image transformer FrIT is explored for global contextual and sequential feature extraction. Unlike the attention-based representations in classic VIT, FrIT is capable of speeding up the transformer architectures massively and learning valuable contextual information effectively and efficiently. More significantly, to reduce redundancy and loss of information from shallow to deep layers, FrIT is devised to connect contextual features in multiple fractional domains. Five HSI and LiDAR scenes including one newly labeled benchmark are utilized for extensive experiments, showing improvement over both CNNs and VITs.

Fractional Fourier Image Transformer for Multimodal Remote Sensing Data Classification

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Fractional Fourier Image Transformer for Multimodal Remote Sensing Data Classification

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文