4.6 Article

CMENet: A Cross-Modal Enhancement Network for Tobacco Leaf Grading

Journal

IEEE ACCESS
Volume 11, Issue -, Pages 109201-109212

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2023.3321111

Keywords

Tobacco leaf grading; image classification; convolutional neural network; cross-modal information fusion

Ask authors/readers for more resources

Tobacco leaf grading is crucial for ensuring the quality of tobacco production. Traditional manual grading is limited by the high visual similarity among reflection images and the inconsistent visual appearances caused by different planting locations. Therefore, we propose CMENet, an end-to-end Cross-modal Enhancement Network, which integrates multimodal information to achieve automatic grading and achieves high accuracy.
Tobacco leaf grading plays a crucial role in ensuring the quality of tobacco production. For a very long period, the grading process is manually determined by experienced experts. In recent years, some methods have been introduced to automate the grading process by utilizing the reflection images of tobacco leaves. However, the high visual similarity among reflection images at different grades renders a single reflection image insufficient for achieving accurate grading. Besides, the tobacco leaves with an identical grade may have inconsistent visual appearances due to their different planting locations. It is known that an expert integrates multimodal information such as visual, tactile, and planting location cues when performing grading. Inspired by this, we propose an end-to-end Cross-modal Enhancement Network, named CMENet, for automatic tobacco leaf grading. In addition to the common reflection image, the network also adopts a transmission image to incorporate the thickness information in manual grading. CMENet comprises a difference-aware fusion module and a meta self-attention module, enabling the extraction of multimodal information from the transmission image and the planting location, respectively. Experimental results demonstrate that our CMENet achieves a high grading accuracy (80.15%) when incorporating multimodal information, surpassing the performance of existing methods that rely solely on reflection images.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available