Consistency Loss for Improved Colonoscopy Landmark Detection with Vision Transformers

Tamhane, Aniruddha; Dobkin, Daniel; Shtalrid, Ore; Bouhnik, Moshe; Posner, Erez; Mida, Tse’ela

doi:10.1007/978-3-031-45676-3_13

Aniruddha Tamhane¹²,
Daniel Dobkin¹²,
Ore Shtalrid¹²,
Moshe Bouhnik¹²,
Erez Posner¹² &
…
Tse’ela Mida¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14349))

Included in the following conference series:

International Workshop on Machine Learning in Medical Imaging

972 Accesses

Abstract

Colonoscopy is a procedure used to examine the colon and rectum for colorectal cancer or other abnormalities including polyps or diverticula. Apart from the actual diagnosis, manually processing the snapshots taken during the colonoscopy procedure (for medical record keeping) consumes a large amount of the clinician’s time. This can be automated through post-procedural machine learning based algorithms which classify anatomical landmarks in the colon. In this work, we have developed a pipeline for training vision-transformers for identifying anatomical landmarks, including appendiceal orifice, ileocecal valve/cecum landmark and rectum retroflection. To increase the accuracy of the model, we utilize a hybrid approach that combines algorithm-level and data-level techniques. We introduce a consistency loss to enhance model immunity to label inconsistencies, as well as a semantic non-landmark sampling technique aimed at increasing focus on colonic findings. For training and testing our pipeline, we have annotated 307 colonoscopy videos and 2363 snapshots with the assistance of several medical experts for enhanced reliability. The algorithm identifies landmarks with an accuracy of 92% on the test dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adewole, S., et al.: Deep learning methods for anatomical landmark detection in video capsule endoscopy images. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) FTC 2020. AISC, vol. 1288, pp. 426–434. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-63128-4_32
Chapter Google Scholar
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: Mixmatch: a holistic approach to semi-supervised learning. In: Advances in Neural Information Processing Systems 32 (2019)
Google Scholar
Cao, Y., Liu, D., Tavanapong, W., Wong, J., Oh, J., De Groen, P.C.: Automatic classification of images with appendiceal orifice in colonoscopy videos. In: 2006 International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 2349–2352. IEEE (2006)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article MATH Google Scholar
Che, K., et al.: Deep learning-based biological anatomical landmark detection in colonoscopy videos. arXiv preprint arXiv:2108.02948 (2021)
Chowdhury, A.S., Yao, J., VanUitert, R., Linguraru, M.G., Summers, R.M.: Detection of anatomical landmarks in human colon from computed tomographic colonography images. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4. IEEE (2008)
Google Scholar
Cooper, J.A., Ryan, R., Parsons, N., Stinton, C., Marshall, T., Taylor-Phillips, S.: The use of electronic healthcare records for colorectal cancer screening referral decisions and risk prediction model development. BMC Gastroenterol. 20(1), 1–16 (2020)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Dosovitskiy, A., et al.: an image is worth 16$\,\times \,$16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Estabrooks, A., Japkowicz, N.: A mixture-of-experts framework for learning from imbalanced data sets. In: Hoffmann, F., Hand, D.J., Adams, N., Fisher, D., Guimaraes, G. (eds.) IDA 2001. LNCS, vol. 2189, pp. 34–43. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44816-0_4
Chapter MATH Google Scholar
Fan, Y., Kukleva, A., Dai, D., Schiele, B.: Revisiting consistency regularization for semi-supervised learning. Int. J. Comput. Vis. 131, 1–18 (2022). https://doi.org/10.1007/s11263-022-01723-4
Article Google Scholar
Foret, P., Kleiner, A., Mobahi, H., Neyshabur, B.: Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412 (2020)
Ghesu, F.C., Georgescu, B., Mansi, T., Neumann, D., Hornegger, J., Comaniciu, D.: An artificial agent for anatomical landmark detection in medical images. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9902, pp. 229–237. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46726-9_27
Chapter Google Scholar
Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM SIGKDD Explor. Newsl. 6(1), 40–49 (2004)
Article Google Scholar
Johnson, J.M., Khoshgoftaar, T.M.: Survey on deep learning with class imbalance. J. Big Data 6(1), 1–54 (2019)
Article Google Scholar
Katzir, L., et al.: Estimating withdrawal time in colonoscopies. In: Computer Vision-ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, pp. 495–512. Springer (2023). https://doi.org/10.1007/978-3-031-25066-8_28
Mamonov, A.V., Figueiredo, I.N., Figueiredo, P.N., Tsai, Y.H.R.: Automated polyp detection in colon capsule endoscopy. IEEE Trans. Med. Imaging 33(7), 1488–1502 (2014)
Article Google Scholar
McDonald, C.J., Callaghan, F.M., Weissman, A., Goodwin, R.M., Mundkur, M., Kuhn, T.: Use of internist’s free time by ambulatory care electronic medical record systems. JAMA Intern. Med. 174(11), 1860–1863 (2014)
Article Google Scholar
Morelli, M.S., Miller, J.S., Imperiale, T.F.: Colonoscopy performance in a large private practice: a comparison to quality benchmarks. J. Clin. Gastroenterol. 44(2), 152–153 (2010)
Article Google Scholar
Morgan, E., et al.: Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from GLOBOCAN. Gut 72(2), 338–344 (2023)
Article Google Scholar
Mullick, S.S., Datta, S., Das, S.: Generative adversarial minority oversampling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1695–1704 (2019)
Google Scholar
Park, S.Y., Sargent, D., Spofford, I., Vosburgh, K.G., Yousif, A., et al.: A colon video analysis framework for polyp detection. IEEE Trans. Biomed. Eng. 59(5), 1408–1418 (2012)
Article Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Qadir, H.A., Shin, Y., Solhusvik, J., Bergsland, J., Aabakken, L., Balasingham, I.: Toward real-time polyp detection using fully CNNs for 2D gaussian shapes prediction. Med. Image Anal. 68, 101897 (2021)
Article Google Scholar
Sohn, K., et al.: FixMatch: simplifying semi-supervised learning with consistency and confidence. Adv. Neural. Inf. Process. Syst. 33, 596–608 (2020)
Google Scholar
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45(4), 427–437 (2009)
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Tamhane, A., Mida, T., Posner, E., Bouhnik, M.: Colonoscopy landmark detection using vision transformers. In: Imaging Systems for GI Endoscopy, and Graphs in Biomedical Image Analysis: First MICCAI Workshop, ISGIE 2022, and Fourth MICCAI Workshop, GRAIL 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18, 2022, Proceedings, pp. 24–34. Springer (2022). https://doi.org/10.1007/978-3-031-21083-9_3
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems 30 (2017)
Google Scholar
Vuttipittayamongkol, P., Elyan, E.: Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Inf. Sci. 509, 47–70 (2020)
Article Google Scholar
Zhang, J., et al.: Colonoscopic screening is associated with reduced colorectal cancer incidence and mortality: a systematic review and meta-analysis. J. Cancer 11(20), 5953 (2020)
Article Google Scholar
Zhou, S.K., et al.: A review of deep learning in medical imaging: imaging traits, technology trends, case studies with progress highlights, and future promises. In: Proceedings of the IEEE (2021)
Google Scholar
Zhou, S.K., Xu, Z.: Landmark detection and multiorgan segmentation: representations and supervised approaches. In: Handbook of Medical Image Computing and Computer Assisted Intervention, pp. 205–229. Elsevier (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Intuitive Surgical, Inc., 1020 Kifer Road, Sunnyvale, CA, USA
Aniruddha Tamhane, Daniel Dobkin, Ore Shtalrid, Moshe Bouhnik, Erez Posner & Tse’ela Mida

Authors

Aniruddha Tamhane
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Dobkin
View author publications
You can also search for this author in PubMed Google Scholar
Ore Shtalrid
View author publications
You can also search for this author in PubMed Google Scholar
Moshe Bouhnik
View author publications
You can also search for this author in PubMed Google Scholar
Erez Posner
View author publications
You can also search for this author in PubMed Google Scholar
Tse’ela Mida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erez Posner .

Editor information

Editors and Affiliations

Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
Xiaohuan Cao
Rensselaer Polytechnic Institute, Troy, NY, USA
Xuanang Xu
Imperial College London, London, UK
Islem Rekik
ShanghaiTech University, Shanghai, China
Zhiming Cui
Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
Xi Ouyang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tamhane, A., Dobkin, D., Shtalrid, O., Bouhnik, M., Posner, E., Mida, T. (2024). Consistency Loss for Improved Colonoscopy Landmark Detection with Vision Transformers. In: Cao, X., Xu, X., Rekik, I., Cui, Z., Ouyang, X. (eds) Machine Learning in Medical Imaging. MLMI 2023. Lecture Notes in Computer Science, vol 14349. Springer, Cham. https://doi.org/10.1007/978-3-031-45676-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-45676-3_13
Published: 15 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45675-6
Online ISBN: 978-3-031-45676-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Consistency Loss for Improved Colonoscopy Landmark Detection with Vision Transformers