skip to main content
10.1145/3633598.3633623acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicaaiConference Proceedingsconference-collections
research-article

Endoscopic Image Classification using Vision Transformers

Published:22 January 2024Publication History

ABSTRACT

Convolutional Neural Networks (CNNs) have been the state-of-the-art techniques applied in the field of medical imaging for numerous image processing tasks. Recently, vision transformer networks are emerging as another technique, complementing current CNNs in the medical field providing on-par performance while also having a number of unique characteristics that may be useful for medical image processing. While CNNs have been predominantly applied to artefact detection and classification in endoscopic images, ViT has been sparsely applied in this area. Additionally, both CNN and ViT have been sparingly applied to colour misalignment artefact classification. In this work, we, therefore, explore the application of Vision Transformer (ViT) in the classification of artefacts in endoscopic images of the gastrointestinal tract organs. Furthermore, the performance of ViT is compared to that of CNN in the classification of colour misalignment artefacts. Our customised ViT model, based on DeiT (Data-efficient image Transformers), has obtained an accuracy of 96.33% as compared to the CNN based Inceptionv3 model with an accuracy of 78.67% and InceptionResNetv2 with 76.67%. The results demonstrate that when pretrained on ImageNet, ViT offer better performance than CNNs in colour misalignment artefact classification. This is due to the ability of ViT to better depict the relationship between image pixels through self-attention weights. Moreover, the built-in self-attention mechanism offers fresh insight into the decision-making processes of the model.

References

  1. Sharib Ali, Mariia Dmitrieva, Noha Ghatwary, Sophia Bano, Gorkem Polat, Alptekin Temizel, Adrian Krenzer, Amar Hekalo, Yun Bo Guo, Bogdan Matuszewski, Mourad Gridach, Irina Voiculescu, Vishnusai Yoganand, Arnav Chavan, Aryan Raj, Nhan T. Nguyen, Dat Q. Tran, Le Duy Huynh, Nicolas Boutry, Shahadate Rezvy, Haijian Chen, Yoon Ho Choi, Anand Subramanian, Velmurugan Balasubramanian, Xiaohong W. Gao, Hongyu Hu, Yusheng Liao, Danail Stoyanov, Christian Daul, Stefano Realdon, Renato Cannizzaro, Dominique Lamarque, Terry Tran-Nguyen, Adam Bailey, Barbara Braden, James East, and Jens Rittscher. 2021. Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy. Medical Image Analysis 70 (May 2021), 102002. https://doi.org/10.1016/j.media.2021.102002 arXiv:2010.06034 [cs].Google ScholarGoogle ScholarCross RefCross Ref
  2. Sharib Ali, Felix Zhou, Christian Daul, Barbara Braden, Adam Bailey, Stefano Realdon, James East, Georges Wagnières, Victor Loschenov, Enrico Grisan, Walter Blondel, and Jens Rittscher. 2019. Endoscopy artifact detection (EAD 2019) challenge dataset. https://doi.org/10.17632/C7FJBXCGJ9.1 arXiv:1905.03209 [cs, eess].Google ScholarGoogle ScholarCross RefCross Ref
  3. Mayank Banoula. [n. d.]. What Is Deep Learning? | How It Works, Techniques & Applications.Google ScholarGoogle Scholar
  4. datagen.tech. [n. d.]. ResNet-50: The Basics and a Quick Tutorial.Google ScholarGoogle Scholar
  5. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. https://openreview.net/forum?id=YicbFdNTTyGoogle ScholarGoogle Scholar
  6. Xiaohong Gao, Barbara Braden, Stephen Taylor, and Wei Pang. 2019. Towards Real-Time Detection of Squamous Pre-Cancers from Oesophageal Endoscopic Videos. In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). 1606–1612. https://doi.org/10.1109/ICMLA.2019.00264Google ScholarGoogle ScholarCross RefCross Ref
  7. Xiaohong W. Gao, Stephen Taylor, Wei Pang, Rui Hui, Xin Lu, and Barbara Braden. 2023. Fusion of colour contrasted images for early detection of oesophageal squamous cell dysplasia from endoscopic videos in real time. Information Fusion 92 (April 2023), 64–79. https://doi.org/10.1016/j.inffus.2022.11.023Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Behnaz Gheflati and Hassan Rivaz. 2022. Vision Transformer for Classification of Breast Ultrasound Images. https://doi.org/10.48550/arXiv.2110.14731 arXiv:2110.14731 [cs].Google ScholarGoogle ScholarCross RefCross Ref
  9. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90 ISSN: 1063-6919.Google ScholarGoogle ScholarCross RefCross Ref
  10. Christos Matsoukas, Johan Haslum, Magnus Soderberg, and Kevin Smith. 2021. Is it Time to Replace CNNs with Transformers for Medical Images?Google ScholarGoogle Scholar
  11. Ken Namikawa, Toshiaki Hirasawa, Toshiyuki Yoshio, Junko Fujisaki, Tsuyoshi Ozawa, Soichiro Ishihara, Tomonori Aoki, Atsuo Yamada, Kazuhiko Koike, Hideo Suzuki, and Tomohiro Tada. 2020. Utilizing artificial intelligence in endoscopy: a clinician’s guide. Expert Review of Gastroenterology & Hepatology 14, 8 (Aug. 2020), 689–706. https://doi.org/10.1080/17474124.2020.1779058Google ScholarGoogle ScholarCross RefCross Ref
  12. Nhan T. Nguyen, Dat Q. Tran, and Dung B. Nguyen. 2020. Detection and Segmentation of Endoscopic Artefacts and Diseases Using Deep Architectures. https://doi.org/10.1101/2020.04.17.20070201 Pages: 2020.04.17.20070201.Google ScholarGoogle ScholarCross RefCross Ref
  13. Ilkay Oksuz, James R. Clough, James R. Clough, and Julia A. Schnabel. 2019. Artefact detection in video endoscopy using retinanet and focal loss function. CEUR Workshop Proceedings 2366 (2019). http://www.scopus.com/inward/record.url?scp=85066467552&partnerID=8YFLogxKGoogle ScholarGoogle Scholar
  14. Shehan Perera, Srikar Adhikari, and Alper Yilmaz. 2021. POCFormer: A Lightweight Transformer Architecture for Detection of COVID-19 Using Point of Care Ultrasound. https://doi.org/10.48550/arXiv.2105.09913 arXiv:2105.09913 [cs, eess].Google ScholarGoogle ScholarCross RefCross Ref
  15. Fahad Shamshad, Salman Khan, Syed Waqas Zamir, Muhammad Haris Khan, Munawar Hayat, Fahad Shahbaz Khan, and Huazhu Fu. 2022. Transformers in Medical Imaging: A Survey. https://doi.org/10.48550/arXiv.2201.09873 arXiv:2201.09873 [cs, eess].Google ScholarGoogle ScholarCross RefCross Ref
  16. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. https://doi.org/10.48550/arXiv.1409.1556 arXiv:1409.1556 [cs].Google ScholarGoogle ScholarCross RefCross Ref
  17. Hyuna Sung, Jacques Ferlay, Rebecca L. Siegel, Mathieu Laversanne, Isabelle Soerjomataram, Ahmedin Jemal, and Freddie Bray. 2021. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA: a cancer journal for clinicians 71, 3 (May 2021), 209–249. https://doi.org/10.3322/caac.21660Google ScholarGoogle ScholarCross RefCross Ref
  18. Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. https://doi.org/10.48550/arXiv.1602.07261 arXiv:1602.07261 [cs].Google ScholarGoogle ScholarCross RefCross Ref
  19. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the Inception Architecture for Computer Vision. https://doi.org/10.48550/arXiv.1512.00567 arXiv:1512.00567 [cs].Google ScholarGoogle ScholarCross RefCross Ref
  20. Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image transformers & distillation through attention. https://doi.org/10.48550/arXiv.2012.12877 arXiv:2012.12877 [cs].Google ScholarGoogle ScholarCross RefCross Ref
  21. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc.https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.htmlGoogle ScholarGoogle Scholar
  22. VideoProc. [n. d.]. [OFFICIAL]VideoProc Converter - One-Stop Video Processing Software for Windows Mac. https://www.videoproc.com/Google ScholarGoogle Scholar
  23. Lianlian Wu, Wei Zhou, Xinyue Wan, Jun Zhang, Lei Shen, Shan Hu, Qianshan Ding, Ganggang Mu, Anning Yin, Xu Huang, Jun Liu, Xiaoda Jiang, Zhengqiang Wang, Yunchao Deng, Mei Liu, Rong Lin, Tingsheng Ling, Peng Li, Qi Wu, Peng Jin, Jie Chen, and Honggang Yu. 2019. A deep neural network improves endoscopic detection of early gastric cancer without blind spots. Endoscopy 51, 6 (June 2019), 522–531. https://doi.org/10.1055/a-0855-3532Google ScholarGoogle ScholarCross RefCross Ref
  24. Suhui Yang and G. Cheng. 2019. ENDOSCOPIC ARTEFACT DETECTION AND SEGMENTATION WITH DEEP CONVOLUTIONAL NEURAL NETWORK. https://www.semanticscholar.org/paper/ENDOSCOPIC-ARTEFACT-DETECTION-AND-SEGMENTATION-WITH-Yang-Cheng/57c589a70e3dd1b9fcb57ccd7361387ddfc3e8edGoogle ScholarGoogle Scholar

Index Terms

  1. Endoscopic Image Classification using Vision Transformers

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICAAI '23: Proceedings of the 2023 7th International Conference on Advances in Artificial Intelligence
      October 2023
      151 pages
      ISBN:9798400708985
      DOI:10.1145/3633598

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 January 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format