4.7 Article

A Vegetable Leaf Disease Identification Model Based on Image-Text Cross-Modal Feature Fusion

Journal

FRONTIERS IN PLANT SCIENCE
Volume 13, Issue -, Pages -

Publisher

FRONTIERS MEDIA SA
DOI: 10.3389/fpls.2022.918940

Keywords

cross-modal fusion; transformer; few-shot; complex background; disease identification

Categories

Funding

  1. National Key Research and Development Program of China [2019YFD1101105]
  2. National Natural Science Foundation of China [62106065]
  3. Hebei Province Key Research and Development Program [20327402D]
  4. National Technical System of Bulk Vegetable Industry of China [CARS-23-C06]
  5. Youth Found of Beijing Academy of Agriculture and Forestry Sciences [QNJJ202030]

Ask authors/readers for more resources

In this paper, an end-to-end disease identification model combining a disease-spot region detector and a disease classifier was proposed for automatic identification of vegetable diseases in field environments. By introducing bidirectional cross-modal feature fusion, the model achieved optimal results on a small dataset.
In view of the differences in appearance and the complex backgrounds of crop diseases, automatic identification of field diseases is an extremely challenging topic in smart agriculture. To address this challenge, a popular approach is to design a Deep Convolutional Neural Network (DCNN) model that extracts visual disease features in the images and then identifies the diseases based on the extracted features. This approach performs well under simple background conditions, but has low accuracy and poor robustness under complex backgrounds. In this paper, an end-to-end disease identification model composed of a disease-spot region detector and a disease classifier (YOLOv5s + BiCMT) was proposed. Specifically, the YOLOv5s network was used to detect the disease-spot regions so as to provide a regional attention mechanism to facilitate the disease identification task of the classifier. For the classifier, a Bidirectional Cross-Modal Transformer (BiCMT) model combining the image and text modal information was constructed, which utilizes the correlation and complementarity between the features of the two modalities to achieve the fusion and recognition of disease features. Meanwhile, the problem of inconsistent lengths among different modal data sequences was solved. Eventually, the YOLOv5s + BiCMT model achieved the optimal results on a small dataset. Its Accuracy, Precision, Sensitivity, and Specificity reached 99.23, 97.37, 97.54, and 99.54%, respectively. This paper proves that the bidirectional cross-modal feature fusion by combining disease images and texts is an effective method to identify vegetable diseases in field environments.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available