Journal
NEURAL COMPUTING & APPLICATIONS
Volume 35, Issue 1, Pages 735-748Publisher
SPRINGER LONDON LTD
DOI: 10.1007/s00521-022-07789-y
Keywords
Image classification; Vision transformer; Self-attention; Knowledge distillation
Categories
Ask authors/readers for more resources
Pollen identification has broad applications in various fields, and pollen allergy is a common and frequent disease. Accurate and rapid identification of pollen species under the electron microscope can help with pollen forecast and treatment. In this study, a new Vision Transformer pipeline for image classification is proposed, which achieves CNN-equivalent performance on the pollen dataset with fewer model parameters and training time.
Pollen identification is a sub-discipline of Palynology, which has broad applications in several fields such as allergy control, paleoclimate reconstruction, criminal investigation, and petroleum exploration. Among these, pollen allergy is a common and frequent disease worldwide. Accurate and rapid identification of pollen species under the electron microscope help medical staff in pollen forecast and interrupt the natural course of pollen allergy. The current pollen species identification needs to rely on professional researchers to identify pollen particles in pictures manually, and this time-consuming and laborious way cannot meet the requirements of pollen forecasting. Recently, the self-attention based Transformer has attracted considerable attention in vision tasks, such as image classification. However, pure self-attention lacks local operations on pixels and requires large-scale dataset pretraining to achieve comparable performance to convolutional neural networks (CNN). In this study, we propose a new Vision Transformer pipeline for image classification. First, we design a FeatureMap-to-Token (F2T) module to perform token embedding on the input image. A global self-attention operation is performed on the basis of tokens with local information, and the hierarchical design of CNN is applied to the Vision Transformer, combining local and global strengths in multiscale spaces. Second, we use a distillation strategy to learn the feature representation in the output space of the teacher network to further learn the inductive bias in the CNN to improve the recognition accuracy. Experiments demonstrate that the proposed model achieves CNN-equivalent performance under the same conditions after being trained from scratch on the electron-microscopic pollen dataset. It also requires less model parameters and training time. Code for the model is available at https://github.com/dkbshuai/PyTorchOur-S.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available