Abstract
With the advances in endoscopic technologies and artificial intelligence, a large number of endoscopic imaging datasets have been made public to researchers around the world. This study aims to review and introduce these datasets. An extensive literature search was conducted to identify appropriate datasets in PubMed, and other targeted searches were conducted in GitHub, Kaggle, and Simula to identify datasets directly. We provided a brief introduction to each dataset and evaluated the characteristics of the datasets included. Moreover, two national datasets in progress were discussed. A total of 40 datasets of endoscopic images were included, of which 34 were accessible for use. Basic and detailed information on each dataset was reported. Of all the datasets, 16 focus on polyps, and 6 focus on small bowel lesions. Most datasets (n = 16) were constructed by colonoscopy only, followed by normal gastrointestinal endoscopy and capsule endoscopy (n = 9). This review may facilitate the usage of public dataset resources in endoscopic research.
Similar content being viewed by others
Data Availability
The endoscopic imaging data supporting the findings of the review are available within the article. The websites of available datasets are provided in Table 1.
References
Nishiyama S, et al.: Clinical usefulness of endocytoscopy in the remission stage of ulcerative colitis: a pilot study. J Gastroenterol 50:1087-1093, 2015
Corley DA, Levin TR, Doubeni CA: Adenoma detection rate and risk of colorectal cancer and death. N Engl J Med 370:2541, 2014. https://doi.org/10.1056/NEJMc1405329
Telford JJ, Enns RA: Endoscopic missed rates of upper gastrointestinal cancers: parallels with colonoscopy. Am J Gastroenterol 105:1298-1300, 2010
Iddan G, Meron G, Glukhovsky A, Swain P: Wireless capsule endoscopy. Nature 405:417, 2000. https://doi.org/10.1038/35013140
McAlindon ME, Ching HL, Yung D, Sidhu R, Koulaouzidis A: Capsule endoscopy of the small bowel. Ann Transl Med 4:369, 2016. https://doi.org/10.21037/atm.2016.09.18
He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K: The practical implementation of artificial intelligence technologies in medicine. Nat Med 25:30-36, 2019
Bernal J, Sánchez J, Vilariño F: Towards automatic polyp detection with a polyp appearance model. Pattern Recognition 45:3166-3182, 2012
Bernal J, Sánchez FJ, Fernández-Esparrach G, Gil D, Rodríguez C, Vilariño F: WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Comput Med Imaging Graph 43:99–111, 2015
Silva J, Histace A, Romain O, Dray X, Granado B: Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer. Int J Comput Assist Radiol Surg 9:283-293, 2014
Tajbakhsh N, Gurudu SR, Liang J: Automated Polyp Detection in Colonoscopy Videos Using Shape and Context Information. IEEE Trans Med Imaging 35:630-644, 2016
Mesejo P, et al.: Computer-Aided Classification of Gastrointestinal Lesions in Regular Colonoscopy. IEEE Trans Med Imaging 35:2051-2063, 2016
Vázquez D, et al.: A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images. J Healthc Eng 2017:4037190, 2017. https://doi.org/10.1155/2017/4037190
Jha D, Smedsrud PH, Riegler MA et al.: Kvasir-seg: A segmented polyp dataset. In: International Conference on MultiMedia Modeling (MMM), pp 451–462, 2020. https://doi.org/10.1007/978-3-030-37734-2_37
Figueiredo I, Pinto L, Figueiredo P, Tsai R: Unsupervised segmentation of colonic polyps in narrow-band imaging data based on manifold representation of images and Wasserstein distance. Biomedical Signal Processing and Control 53:101577, 2019. https://doi.org/10.1016/j.bspc.2019.101577
Figueiredo P, Figueiredo I, Pinto L, Kumar S, Tsai R, Mamonov A: Polyp detection with computer-aided diagnosis in white light colonoscopy: comparison of three different methods. Endoscopy International Open 07:E209-E215, 2019
Patel K, et al.: A comparative study on polyp classification using convolutional neural networks. PLoS One 15:e0236452, 2020. https://doi.org/10.1371/journal.pone.0236452
Misawa M, et al.: Development of a computer-aided detection system for colonoscopy and a publicly accessible large colonoscopy video database (with video). Gastrointest Endosc 93:960-967.e963, 2021
Sanchez-Peralta LF, et al.: PICCOLO White-Light and Narrow-Band Imaging Colonoscopic Dataset: A Performance Comparative of Models and Datasets. Applied Sciences 10:8501, 2020. https://doi.org/10.3390/app10238501
Wang W, Tian J, Zhang C, Luo Y, Wang X, Li J: An improved deep learning approach and its applications on colonic polyp images detection. BMC Med Imaging 20:83, 2020. https://doi.org/10.1186/s12880-020-00482-3
Ma Y, Chen X, Cheng K, Li Y, Sun B: LDPolypVideo Benchmark: A Large-Scale Colonoscopy Video Dataset of Diverse Polyps. In: Medical Image Computing and Computer Assisted Intervention (MICCAI), pp 387–396, 2021. https://doi.org/10.1007/978-3-030-87240-3_37
Ji GP, et al.: Video Polyp Segmentation: A Deep Learning Perspective. Machine Intelligence Research 19:1-19, 2022
Ali S, et al.: A multi-centre polyp detection and segmentation dataset for generalisability assessment. Sci Data 10:75, 2022
Koulaouzidis A, et al.: KID Project: an internet-based digital video atlas of capsule endoscopy for research purposes. Endosc Int Open 5:E477-e483, 2017
Leenhardt R, et al.: CAD-CAP: a 25,000-image database serving the development of artificial intelligence for capsule endoscopy. Endosc Int Open 8:E415-e420, 2020
Smedsrud PH, et al.: Kvasir-Capsule, a video capsule endoscopy dataset. Sci Data 8:142, 2021. https://doi.org/10.1038/s41597-021-00920-z
Kong Z, et al.: Multi-Task Classification and Segmentation for Explicable Capsule Endoscopy Diagnostics. Front Mol Biosci 8:614277, 2021. https://doi.org/10.3389/fmolb.2021.614277
de Maissin A, et al.: Multi-expert annotation of Crohn's disease images of the small bowel for automatic detection using a convolutional recurrent attention neural network. Endosc Int Open 9:E1136-e1144, 2021
García-Peraza-Herrera LC, et al.: Intrapapillary capillary loop classification in magnification endoscopy: open dataset and baseline methodology. Int J Comput Assist Radiol Surg 15:651-659, 2020
Yang J, et al.: A benchmark dataset of endoscopic images and novel deep learning method to detect intestinal metaplasia and gastritis atrophy. IEEE Journal of Biomedical and Health Informatics 27:7-16, 2023
Pogorelov K, Randel KR, Griwodz C, Lange TD, Halvorsen P: KVASIR: A Multi-Class Image Dataset for Computer Aided Gastrointestinal Disease Detection. In: the 8th Acm on Multimedia Systems Conference, pp 164–169, 2017. https://doi.org/10.1145/3083187.3083212
Borgli H, et al.: HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci Data 7:283, 2020. https://doi.org/10.1038/s41597-020-00622-y
Charoen A, et al.: Rhode Island gastroenterology video capsule endoscopy data set. Sci Data 9:602, 2022. https://doi.org/10.1038/s41597-022-01726-3
Montalbo F: Diagnosing gastrointestinal diseases from endoscopy images through a multi-fused CNN with auxiliary layers, alpha dropouts, and a fusion residual block. Biomedical signal processing and control 76:103683, 2022. https://doi.org/10.1016/j.bspc.2022.103683
Cychnerski J, Dziubich T, Brzeski A: ERS: a novel comprehensive endoscopy image dataset for machine learning, compliant with the MST 3.0 specification. arXiv e-prints, 2022. https://doi.org/10.48550/arXiv.2201.08746
Gastrolab. Available at: http://www.gastrolab.net/index.htm
WEO Clinical Endoscopy Atlas. Available at: http://www.endoatlas.org/index.php
Atlas of Gastrointestinal Endoscopy. Available at: http://www.endoatlas.com/atlas_1.html.
EI salvador atlas. Available at: http://www.gastrointestinalatlas.com/index.html.
Gastrointestinal Image Analysis (GIANA) Angiodysplasia D&L challenge. [Online] https://endovissub2017-giana.grand-challenge.org/home/. Accessed 20 Nov 2017
Pogorelov K, et al.: Nerthus: A Bowel Preparation Quality Video Dataset. In: the 8th Acm on Multimedia Systems Conference, pp 170–174, 2017. https://doi.org/10.1145/3083187.3083216
Angermann Q, et al.: Towards Real-Time Polyp Detection in Colonoscopy Videos: Adapting Still Frame-Based Methodologies for Video Sequences Analysis. In: Computer Assisted and Robotic Endoscopy and Clinical Image-Based Procedures, pp 29–41, 2017. https://doi.org/10.1007/978-3-319-67543-5_3
Endoscopy Artefact Detection (EAD) Dataset. [Online] https://doi.org/10.17632/c7fjbxcgj9.2. Accessed 30 Aug 2019
Cho M, Kim JH, Hong KS, Kim JS, Kong HJ, Kim S: Identification of cecum time-location in a colonoscopy video by deep learning analysis of colonoscope movement. PeerJ 7:e7256, 2019. https://doi.org/10.7717/peerj.7256
Endoscopy Disease Detection and Segmentation (EDD2020). [Online] https://edd2020.grand-challenge.org/Home/
Jha D, et al.: Kvasir-Instrument: Diagnostic and Therapeutic Tool Segmentation Dataset in Gastrointestinal Endoscopy. In: International Conference on MultiMedia Modeling (MMM), pp 218–229, 2020. https://doi.org/10.1007/978-3-030-67835-7_19
Bae S-H, Yoon K-J: Polyp Detection via Imbalanced Learning and Discriminative Feature Learning. IEEE transactions on medical imaging 34, 2015. https://doi.org/10.1109/TMI.2015.2434398
Bernal J, Sanchez J, Vilariño F: Impact of image preprocessing methods on polyp localization in colonoscopy frames. In: Annual International Conference of the IEEE Engineering in Medicine and Biology Society IEEE Engineering in Medicine and Biology Society Conference, pp 7350–7354, 2013. https://doi.org/10.1109/EMBC.2013.6611256
Tajbakhsh N, Gurudu S, Liang J: A Classification-Enhanced Vote Accumulation Scheme for Detecting Colonic Polyps. Computation and Clinical Applications 8198:53-62, 2013
Inoue H KH, et al: The Paris endoscopic classification of superficial neoplastic lesions: esophagus, stomach, and colon: November 30 to December 1, 2002. Gastrointest Endosc 58:S3-43, 2003
Enns RA, et al.: Clinical Practice Guidelines for the Use of Video Capsule Endoscopy. Gastroenterology 152:497-514, 2017
Hale M, McAlindon ME: Capsule endoscopy as a panenteric diagnostic tool. Br J Surg 101:148-149, 2014
Everson M, et al.: Artificial intelligence for the real-time classification of intrapapillary capillary loop patterns in the endoscopic diagnosis of early oesophageal squamous cell carcinoma: A proof-of-concept study. United European Gastroenterol J 7:297-306, 2019
Nishihara R, et al.: Long-term colorectal-cancer incidence and mortality after lower endoscopy. N Engl J Med 369:1095-1105, 2013
Norwood DA, Montalvan EE, Dominguez RL, Morgan DR: Gastric Cancer: Emerging Trends in Prevention, Diagnosis, and Treatment. Gastroenterol Clin North Am 51:501-518, 2022
Riegler M, et al.: Multimedia for Medicine: The Medico Task at MediaEval. In: MediaEval Benchmarking Initiative for Multimedia Evaluation 2017, pp 13–15, 2017
Pogorelov K, et al.: Medico Multimedia Task at MediaEval 2018. In: MediaEval 2018, pp 29–31, 2018
Chang YY, et al.: Development and validation of a deep learning-based algorithm for colonoscopy quality assessment. Surg Endosc 36:6446-6455, 2022
Das D, Lee CSG: A Two-Stage Approach to Few-Shot Learning for Image Recognition. IEEE Trans Image Process 29:3336-3350, 2020
Calderwood AH, Jacobson BC: Comprehensive validation of the Boston Bowel Preparation Scale. Gastrointest Endosc 72:686-692, 2010
Lai EJ, Calderwood AH, Doros G, Fix OK, Jacobson BC: The Boston bowel preparation scale: a valid and reliable instrument for colonoscopy-oriented research. Gastrointest Endosc 69:620-625, 2009
Yang CB, Kim SH, Lim YJ: Preparation of image databases for artificial intelligence algorithm development in gastrointestinal endoscopy. Clin Endosc 55:594-604, 2022
Tanaka K: Japan Endoscopy Database project. Dig Endosc 34 Suppl 2:20-22, 2022
Lee TJ, et al.: Development of a national automated endoscopy database: The United Kingdom National Endoscopy Database (NED). United European Gastroenterol J 7:798-806, 2019
Matsuda K, et al.: Design paper: Japan Endoscopy Database (JED): A prospective, large database project related to gastroenterological endoscopy in Japan. Dig Endosc 30:5-19, 2018
Kodashima S, et al.: First progress report on the Japan Endoscopy Database project. Dig Endosc 30:20-28, 2018
Oda I, Hoteya S, Fujishiro M: Status of Helicobacter pylori infection and gastric mucosal atrophy in patients with gastric cancer: Analysis based on the Japan Endoscopy Database. Dig Endosc 31:103, 2019. https://doi.org/10.1111/den.13287
Saito Y, et al.: Current status of diagnostic and therapeutic colonoscopy in Japan: The Japan Endoscopic Database Project. Dig Endosc 34:144-152, 2022
Rutter MD, Brookes M, Lee TJ, Rogers P, Sharp L: Impact of the COVID-19 pandemic on UK endoscopic activity and cancer detection: a National Endoscopy Database Analysis. Gut 70:537-543, 2021
Hann A, Troya J, Fitting D: Current status and limitations of artificial intelligence in colonoscopy. United European Gastroenterol J 9:527-533, 2021
Nogueira-Rodríguez A, et al.: Deep Neural Networks approaches for detecting and classifying colorectal polyps. Neurocomputing 423:721-734, 2021
Chetcuti Zammit S, Sidhu R: Capsule endoscopy - Recent developments and future directions. Expert Rev Gastroenterol Hepatol 15:127-137, 2021
Houwen B, Nass KJ, Vleugels JLA, Fockens P, Hazewinkel Y, Dekker E: Comprehensive review of publicly available colonoscopic imaging databases for artificial intelligence research: availability, accessibility, and usability. Gastrointest Endosc 97:184-199.e116, 2023
Nogueira-Rodríguez A, Reboiro-Jato M, Glez-Peña D, López-Fernández H: Performance of Convolutional Neural Networks for Polyp Localization on Public Colonoscopy Image Datasets. Diagnostics (Basel) 12, 2022. https://doi.org/10.3390/diagnostics12040898
Krause J, et al.: Grader Variability and the Importance of Reference Standards for Evaluating Machine Learning Models for Diabetic Retinopathy. Ophthalmology 125:1264-1272, 2018
Luo H, et al.: Real-time artificial intelligence for detection of upper gastrointestinal cancer by endoscopy: a multicentre, case-control, diagnostic study. Lancet Oncol 20:1645-1654, 2019
Zhou J, et al.: Application of artificial intelligence in gastrointestinal disease: a narrative review. Ann Transl Med 9:1188, 2021. https://doi.org/10.21037/atm-21-3001
Arnold M, et al.: Global Burden of 5 Major Types of Gastrointestinal Cancer. Gastroenterology 159:335-349.e15, 2020
Funding
This work was supported by the National Natural Science Foundation of China (82000540), Science and Technology Plan of Suzhou City (SKY2021038), Suzhou Clinical Center of Digestive Diseases (Szlcyxzx202101), and Youth Program of Suzhou Health Committee (KJXW2019001).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Zhu JZ conception and design; Zhu SQ drafting of the article; Zhu SQ and Yin MY literature research; Gao JW and Lin JX data extraction; Xu C and Liu L quality assessment; Zhu JZ and Xu CF critical revision of the article; Xu CF and Zhu JZ final approval of the article.
Corresponding authors
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, S., Gao, J., Liu, L. et al. Public Imaging Datasets of Gastrointestinal Endoscopy for Artificial Intelligence: a Review. J Digit Imaging 36, 2578–2601 (2023). https://doi.org/10.1007/s10278-023-00844-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10278-023-00844-7