Skip to main content
Log in

Exploring vision transformer: classifying electron-microscopy pollen images with transformer

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Pollen identification is a sub-discipline of Palynology, which has broad applications in several fields such as allergy control, paleoclimate reconstruction, criminal investigation, and petroleum exploration. Among these, pollen allergy is a common and frequent disease worldwide. Accurate and rapid identification of pollen species under the electron microscope help medical staff in pollen forecast and interrupt the natural course of pollen allergy. The current pollen species identification needs to rely on professional researchers to identify pollen particles in pictures manually, and this time-consuming and laborious way cannot meet the requirements of pollen forecasting. Recently, the self-attention based Transformer has attracted considerable attention in vision tasks, such as image classification. However, pure self-attention lacks local operations on pixels and requires large-scale dataset pretraining to achieve comparable performance to convolutional neural networks (CNN). In this study, we propose a new Vision Transformer pipeline for image classification. First, we design a FeatureMap-to-Token (F2T) module to perform token embedding on the input image. A global self-attention operation is performed on the basis of tokens with local information, and the hierarchical design of CNN is applied to the Vision Transformer, combining local and global strengths in multiscale spaces. Second, we use a distillation strategy to learn the feature representation in the output space of the teacher network to further learn the inductive bias in the CNN to improve the recognition accuracy. Experiments demonstrate that the proposed model achieves CNN-equivalent performance under the same conditions after being trained from scratch on the electron-microscopic pollen dataset. It also requires less model parameters and training time. Code for the model is available at https://github.com/dkbshuai/PyTorch-Our-S.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

 The pollen dataset generated and analysed during the current study is not publicly available because the study is ongoing, but is available from the corresponding author on reasonable request.

References

  1. Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450

  2. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision, Springer, pp 213–229

  3. Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, Ma S, Xu C, Xu C, Gao W (2021) Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12299–12310

  4. Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV (2019) Autoaugment: learning augmentation strategies from data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 113–123

  5. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773

  6. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  7. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al. (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  8. D’Amato G, Cecchi L, Bonini S, Nunes C, Annesi-Maesano I, Behrendt H, Liccardi G, Popov T, Van Cauwenberge P (2007) Allergenic pollen and pollen allergy in Europe. Allergy 62(9):976–990

    Article  Google Scholar 

  9. Gao F, Mu X, Ouyang C, Yang K, Ji S, Guo J, Wei H, Wang N, Ma L, Yang B (2022) Mltdnet: an efficient multi-level transformer network for single image deraining. Neural Comput Appl, pp 1–15

  10. Ghofrani A, Mahdian Toroghi R (2022) Knowledge distillation in plant disease recognition. Neural Comput Appl, pp 1–10

  11. Goncalves AB, Souza JS, Silva GGd, Cereda MP, Pott A, Naka MH, Pistori H (2016) Feature extraction and machine learning for the classification of Brazilian savannah pollen grains. PLoS one 11(6):e0157044

    Article  Google Scholar 

  12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  13. Hendrycks D, Gimpel K (2016) Bridging nonlinearities and stochastic regularizers with gaussian error linear units

  14. Heo B, Yun S, Han D, Chun S, Choe J, Oh SJ (2021) Rethinking spatial dimensions of vision transformers. arXiv preprint arXiv:2103.16302

  15. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531

  16. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  17. Hughes D, Salathé M, et al. (2015) An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv preprint arXiv:1511.08060

  18. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  19. LeCun Y, Haffner P, Bottou L, Bengio Y (1999) Object recognition with gradient-based learning. In: Shape, contour and grouping in computer vision, Springer, pp 319–345

  20. Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z, et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690

  21. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022

  22. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  23. Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101

  24. Mao X, Qi G, Chen Y, Li X, Duan R, Ye S, He Y, Xue H (2021) Towards robust vision transformer. arXiv preprint arXiv:2105.07926

  25. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training

  26. Radosavovic I, Kosaraju RP, Girshick R, He K, Dollár P (2020) Designing network design spaces. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10428–10436

  27. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99

    Google Scholar 

  28. Sarwar AKMG, Hoshino Y, Araki H (2015) Pollen morphology and its taxonomic significance in the genus bomarea mirb. (alstroemeriaceae)-i. subgenera baccata, sphaerine, and wichuraea. Acta bot bras 29:425–432

    Article  Google Scholar 

  29. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626

  30. Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. arXiv preprint arXiv:1803.02155

  31. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  32. Szibor R, Schubert C, Schöning R, Krause D, Wendt U (1998) Pollen analysis reveals murder season. Nature 395(6701):449–450

    Article  Google Scholar 

  33. Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, PMLR, pp 6105–6114

  34. Tang Y, Wang B, He W, Qian F (2022) Pointdet++: an object detection framework based on human local features with transformer encoder. Neural Comput Appl, pp 1–12

  35. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, PMLR, pp 10347–10357

  36. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  37. Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR (2018a) Glue: a multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461

  38. Wang X, Girshick R, Gupta A, He K (2018b) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803

  39. Wang Y, Xie Y, Fan L, Hu G (2022) Stmg: Swin transformer for multi-label image recognition with graph convolution network. Neural Comput Appl 34(12):10051–10063

    Article  Google Scholar 

  40. Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) Cvt: introducing convolutions to vision transformers. arXiv preprint arXiv:2103.15808

  41. Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747

  42. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500

  43. Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang Z, Tay FE, Feng J, Yan S (2021) Tokens-to-token vit: Training vision transformers from scratch on imagenet. arXiv preprint arXiv:2101.11986

  44. Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint arXiv:1605.07146

  45. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, Springer, pp 818–833

  46. Zhou D, Kang B, Jin X, Yang L, Lian X, Jiang Z, Hou Q, Feng J (2021) Deepvit: towards deeper vision transformer. arXiv preprint arXiv:2103.11886

Download references

Acknowledgements

 This study was supported by National Natural Science Foundation of China (62066035, 61962044), Natural Science Foundation of Inner Mongolia Autonomous Region (2022LHMS06004), the basic scientific research business fee project of the universities directly under the Inner Mongolia Autonomous Region (JY20220089).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shi Bao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duan, K., Bao, S., Liu, Z. et al. Exploring vision transformer: classifying electron-microscopy pollen images with transformer. Neural Comput & Applic 35, 735–748 (2023). https://doi.org/10.1007/s00521-022-07789-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07789-y

Keywords

Navigation