☆ 4.7 Article

Visual question answering model for fruit tree disease decision-making based on multimodal deep learning

FRONTIERS IN PLANT SCIENCE (2023)

期刊

FRONTIERS IN PLANT SCIENCE

卷 13, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA

DOI: 10.3389/fpls.2022.1064399

关键词

disease decision-making; deep learning; multimodal fusion; visual question answer; bilinear model; co-attention mechanism

类别

Plant Sciences

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

A visual question answering (VQA) model for fruit tree diseases based on multimodal feature fusion was designed in this study. By querying questions about fruit tree disease images, the model obtains the decision-making answer. The proposed model achieved 86.36% accuracy in decision-making, outperforming existing multimodal methods, and can be widely deployed in intelligent agriculture.

Visual Question Answering (VQA) about diseases is an essential feature of intelligent management in smart agriculture. Currently, research on fruit tree diseases using deep learning mainly uses single-source data information, such as visible images or spectral data, yielding classification and identification results that cannot be directly used in practical agricultural decision-making. In this study, a VQA model for fruit tree diseases based on multimodal feature fusion was designed. Fusing images and Q&A knowledge of disease management, the model obtains the decision-making answer by querying questions about fruit tree disease images to find relevant disease image regions. The main contributions of this study were as follows: (1) a multimodal bilinear factorized pooling model using Tucker decomposition was proposed to fuse the image features with question features: (2) a deep modular co-attention architecture was explored to simultaneously learn the image and question attention to obtain richer graphical features and interactivity. The experiments showed that the proposed unified model combining the bilinear model and co-attentive learning in a new network architecture obtained 86.36% accuracy in decision-making under the condition of limited data (8,450 images and 4,560k Q&A pairs of data), outperforming existing multimodal methods. The data augmentation is adopted on the training set to avoid overfitting. Ten runs of 10-fold cross-validation are used to report the unbiased performance. The proposed multimodal fusion model achieved friendly interaction and fine-grained identification and decision-making performance. Thus, the model can be widely deployed in intelligent agriculture.

Visual question answering model for fruit tree disease decision-making based on multimodal deep learning

期刊

FRONTIERS IN PLANT SCIENCE

出版社

FRONTIERS MEDIA SA

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Visual question answering model for fruit tree disease decision-making based on multimodal deep learning

期刊

FRONTIERS IN PLANT SCIENCE

出版社

FRONTIERS MEDIA SA

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文